MA5755 Data Analysis & Visualization in R/Python/SQL

Course Details

– Importance of analytics and visualization in the era of data abundance.
– Review of probability, statistics and random processes.
- Brief introduction to estimation theory.
– Introduction to machine learning, supervised and unsupervised learning, gradient descent, overfitting, regularization.
– Clustering techniques: K-means, Gaussian mixture models and expectation-maximization, agglomerative clustering, evaluation of clustering - Rand index, mutual information based scores, Fowlkes-Mallows index
– Regression: Linear models, ordinary least squares, ridge regression, LASSO, Gaussian Processes regression.
– Supervised classification methods: K-nearest neighbor, naive Bayes, logistic regression, decision tree, support vector machine.
– Introduction to artificial neural networks (ANNs), deep NNs, convolutional neural network (CNN).
– Data visualization: Basic principles, categorical and continuous variables.
– Exploratory graphical analysis - Creating static graphs, animated visualizations - loops, GIFs and Videos.
– Data visualization in Python and R, examples.

Course References:

– Hastie, T., Tibshirani, R.,, Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. Springer.
– Richard O. Duda, Peter E. Hart, and David G. Stork. 2000. Pattern Classification (2nd Edition). Wiley- Interscience, New York, NY, USA.
– Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.