### MA5755 Data Analysis & Visualization in R/Python/SQL

#### Course Details

– Importance of analytics and visualization in the era of data abundance.

– Review of probability, statistics and random processes.

- Brief introduction to estimation theory.

– Introduction to machine learning, supervised and unsupervised learning, gradient descent, overfitting, regularization.

– Clustering techniques: K-means, Gaussian mixture models and expectation-maximization, agglomerative clustering, evaluation of clustering - Rand index, mutual information based scores, Fowlkes-Mallows index

– Regression: Linear models, ordinary least squares, ridge regression, LASSO, Gaussian Processes regression.

– Supervised classification methods: K-nearest neighbor, naive Bayes, logistic regression, decision tree, support vector machine.

– Introduction to artificial neural networks (ANNs), deep NNs, convolutional neural network (CNN).

– Data visualization: Basic principles, categorical and continuous variables.

– Exploratory graphical analysis - Creating static graphs, animated visualizations - loops, GIFs and Videos.

– Data visualization in Python and R, examples.

#### Course References:

**Textbook**

– Hastie, T., Tibshirani, R.,, Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. Springer.

**Textbook**

– Richard O. Duda, Peter E. Hart, and David G. Stork. 2000. Pattern Classification (2nd Edition). Wiley- Interscience, New York, NY, USA.

– Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.