CESD63: Fundamentals of data analytics

About This Specialization
Do you need to understand big data and how it will impact the business? This Specialization is for the professional coming from a background of: Engineering, computer sciences, statistics, mathematics, economy and management. They will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets.

Course Duration:
August 14th, 2017 – September 3rd, 2017

Course Fee: Rs. 5000/– (Rs.3000/– for S.P.I.T. students) Last date for enrollment: 31st July, 2017

About This Course

Interested in increasing your knowledge of the Big Data landscape? This course is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is for those who want to start thinking about how Big Data might be useful in their business or career. It provides an introduction to one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible — increasing the potential for data to transform our world!
More and more organizations these days use their data a decision supporting tool and to build data intensive products and services. The collection of skills required by organizations to support these functions has been grouped under the term “Data Sciences”. This course will cover the basic concepts of big data, methodologies for analyzing structured and unstructured data with emphasis on the relationship between the Data Scientist and the business needs. The course is intended for the students coming from a background of: Engineering, computer sciences, statistics, mathematics, economy and management.

Data Analytics is the science of analyzing data to convert information to useful knowledge. This knowledge could help us understand our world better, and in many contexts enable us to make better decisions. While this is the broad and grand objective, the last 20 years has seen steeply decreasing costs to gather, store, and process data, creating an even stronger motivation for the use of empirical approaches to problem solving. This course seeks to present you with a wide range of data analytic techniques and is structured around the broad contours of the different types of data analytics, namely, descriptive, inferential, predictive, and prescriptive analytics.

This will be an intensive 40 hour training course that will involve the concepts of big Data analytics with technology. This course will involve one capstone project. A Capstone Project is a larger project designed to help you practice, apply, and showcase the skills you’ve learned.

This course has a total of six modules, divided into various sub-modules as follows:

Fundamentals of data analytics
Prediction analytics: regression and classification
Prediction analytics: Time series, Decision Tree and Neural Network
Big Data analytics : Hadoop and SPARK
Big Data analytics : SCALA
Data analytics applications

What Background Knowledge is necessary?
This course is for those new to data science. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments.

Industries where this course will be useful
Several Companies in the world are recruiting data scientists and analysts with knowledge and skill of Big Data Technologies.

Course Outcomes

To understand algorithmic, theoretical and computational approaches of big data.
To understand regression and classification with respective Big data
To understand data mining algorithms with respective Big data.
To understand HADOOP and SPARK for data analytics.
To use SCALA for BIG DATA analytics.
To design and develop applications in domains as diverse as genomics, medicine, healthcare, clinical, biological and neuro-informatics, natural robotics, language processing etc.

Course Creators/Instructors

Dr. Sudhir N. Dhage, Ph.D.(Technology) in Computer Engineering, VJTI, Mumbai.
Professor, Department of Computer Engineering, SPIT, Mumbai.
Dr. Lalit, Ph.D.(Technology) in Computer Science and Engineering, IIT(BHU), Varanasi.
Scientist/E, R&D- Computer, Bhabha Atomic Research Centre (BARC), Government of India, Mumbai.
Prof. Pramod Bide, M.E. Computer Engineering
Assistant Professor, Department of Computer Engineering, SPIT, Mumbai.
Prof. Yogesh Jadhav, M.E. Computer Engineering
Assistant Professor, Department of Computer Engineering, SPIT, Mumbai.

Course Layout

The first module is on Fundamentals of data analytics, we will cover the basics of Data analytics which will include fundamental of big data concepts. This is necessary to understand and apply on Big Data Applications. The course will be accompanied by hands-on problem solving. This course will help to understand real time problems in Big Data and solve them by using various approaches and methods.

Session 1: Introduction to Big Data
Session 2: Clarifications of types of Big Data
Session 3: Big Data Applications
Session 4: Issues on Big Data
Session 5: Random variables and Probability distributions
Session 6: Regression analysis in statistical analysis of Big Data
Session 7: Big Data Tools and Technologies
Session 8: Processing Big Data