Students will learn many of the most common machine learning methods to include:
This module will teach students how to build and tune these various models with R and Python packages that have been tested and approved due to their ability to scale well (i.e. glmnet, ranger, xgboost, h2o, scikit-learn). However, the motivation in almost every case is to describe the techniques in a way that helps develop intuition for its strengths and weaknesses.
This module will step through the process of building, visualizing, testing, and comparing supervised models. The goal is to expose you to building machine learning models using a variety of algorithms. By the end of this module you should:
This module makes a few assumptions of your established knowledge regarding your programming skills and exposure to basic statistical concepts. Below are my assumptions and the relevant courses that you should have already attended to make sure you are properly prepared. The material provides examples in both R and Python so as long as you are proficient with the assumptions below for one language then you will be good to go.
Code to run the examples are included in the main course repository in the ML folder: Misk-DSI-2021-01/ML
.
For R users, please run the R/00-setup.Rmd
script to install the necessary packages. For Python users, a requirements.txt
file will be provided and discussed in class.
Lesson | Description | Reading(s) | Slides | Source code |
---|---|---|---|---|
1 | Introduction to machine learning | Notebook | HTML | 01-introduction.* |
2 | The modeling process | Notebook | HTML | 02-modeling-process.* |
3 | Feature and target engineering | Notebook | HTML | 03-feature-engineering.* |
4 | ML Portfolio builder #1 | Notebook | ||
5 | Linear regression | Notebook | HTML | /04-linear-regression.* |
6 | Logistic regression | Notebook | HTML | 05-logistic-regression.* |
7 | Regularized regression | Notebook | HTML | 06-regularized-regression.* |
8 | ML Portfolio builder #2 | Notebook | ||
9 | Multivariate adaptive regression splines | Notebook | HTML | 07-mars.* |
10 | K-nearest neighbors | Notebook | HTML | 08-knn.* |
11 | Decision trees | Notebook | HTML | 09-decision-trees.* |
12 | Bagging | Notebook | HTML | 10-bagging.* |
13 | Random forests | Notebook | HTML | 11-random-forests.* |
14 | ML Portfolio builder #3 | Notebook | ||
15 | Gradient boosting | Notebook | HTML | 12-gbm.* |
16 | Stacked models & AutoML | Notebook | HTML | 13-stacking.* |
17 | ML Portfolio builder #4 | Notebook |