Hands-on machine learning for predictive analytics

Misk Academy

Overview

Students will learn many of the most common machine learning methods to include:

A proper modeling process
Feature engineering
Linear and logistic regression
Regularized models
K-nearest neighbors
Random forests
Gradient boosting machines
Stacking / super learners
And more!

This module will teach students how to build and tune these various models with R and Python packages that have been tested and approved due to their ability to scale well (i.e. glmnet, ranger, xgboost, h2o, scikit-learn). However, the motivation in almost every case is to describe the techniques in a way that helps develop intuition for its strengths and weaknesses.

Learning Objectives

This module will step through the process of building, visualizing, testing, and comparing supervised models. The goal is to expose you to building machine learning models using a variety of algorithms. By the end of this module you should:

Understand how to apply an end-to-end modeling process that allows you to find an optimal model.
Be able to properly pre-process your feature and target variables.
Interpret, apply and compare today’s most popular and effective machine learning algorithms.
Methodically and efficiently tune these algorithms.
Visualize and compare how features impact these models.

Prework

This module makes a few assumptions of your established knowledge regarding your programming skills and exposure to basic statistical concepts. Below are my assumptions and the relevant courses that you should have already attended to make sure you are properly prepared. The material provides examples in both R and Python so as long as you are proficient with the assumptions below for one language then you will be good to go.

Assumptions

Comfortable with R & Python programming
Proficient with basic data wrangling tasks
Knowledgable of foundational statistics

Code books

Code to run the examples are included in the main course repository in the ML folder: Misk-DSI-2021-01/ML .

For R users, please run the R/00-setup.Rmd script to install the necessary packages. For Python users, a requirements.txt file will be provided and discussed in class.

Lessons

Lesson	Description	Reading(s)	Slides	Source code
1	Introduction to machine learning	Notebook	HTML	`01-introduction.*`
2	The modeling process	Notebook	HTML	`02-modeling-process.*`
3	Feature and target engineering	Notebook	HTML	`03-feature-engineering.*`
4	ML Portfolio builder #1	Notebook
5	Linear regression	Notebook	HTML	`/04-linear-regression.*`
6	Logistic regression	Notebook	HTML	`05-logistic-regression.*`
7	Regularized regression	Notebook	HTML	`06-regularized-regression.*`
8	ML Portfolio builder #2	Notebook
9	Multivariate adaptive regression splines	Notebook	HTML	`07-mars.*`
10	K-nearest neighbors	Notebook	HTML	`08-knn.*`
11	Decision trees	Notebook	HTML	`09-decision-trees.*`
12	Bagging	Notebook	HTML	`10-bagging.*`
13	Random forests	Notebook	HTML	`11-random-forests.*`
14	ML Portfolio builder #3	Notebook
15	Gradient boosting	Notebook	HTML	`12-gbm.*`
16	Stacked models & AutoML	Notebook	HTML	`13-stacking.*`
17	ML Portfolio builder #4	Notebook