<h3>MA5755 Data Analysis & Visualization in R/Python/SQL</h3>
<br/>
<h4>Course Details</h4>
– Importance of analytics and visualization in the era of data abundance.</br>
– Review of probability, statistics and random processes.</br>
- Brief introduction to estimation theory.</br>
– Introduction to machine learning, supervised and unsupervised learning, gradient descent, overfitting, regularization.</br>
– Clustering techniques: K-means, Gaussian mixture models and expectation-maximization, agglomerative clustering, evaluation of clustering - Rand index, mutual information based scores, Fowlkes-Mallows index</br>
– Regression: Linear models, ordinary least squares, ridge regression, LASSO, Gaussian Processes regression.</br>
– Supervised classification methods: K-nearest neighbor, naive Bayes, logistic regression, decision tree, support vector machine.</br>
– Introduction to artificial neural networks (ANNs), deep NNs, convolutional neural network (CNN).</br>
– Data visualization: Basic principles, categorical and continuous variables.</br>
– Exploratory graphical analysis - Creating static graphs, animated visualizations - loops, GIFs and Videos.</br>
– Data visualization in Python and R, examples.</br>
<br/>
<h4>Course References:</h4>
<b>Textbook</b></br>
– Hastie, T., Tibshirani, R.,, Friedman, J. (2009). The elements of statistical learning: data mining, inference and prediction. Springer. </br>
<b>Textbook</b></br>
– Richard O. Duda, Peter E. Hart, and David G. Stork. 2000. Pattern Classification (2nd Edition). Wiley- Interscience, New York, NY, USA.</br>
– Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg.</br>