class: misk-title-slide <a href="https://github.com/misk-data-science/misk-homl"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"></a> <br><br><br><br> # .font120[Hands-on Machine Learning for <br>Predictive Analytics] --- # Course objectives <br><br><br> .font130[ This workshop will step through the process of building, visualizing, testing, and comparing supervised models. The goal is to expose you to building machine learning models in R using a variety of packages and model types. ] <br><br> .center.bold[_You will gain deeper knowledge around the analytic modeling process and apply various supervised machine learning algorithms_] --- # Course overview .font110[Moving from a machine learning apprentice to journeyman with
<i class="fab fa-r-project faa-FALSE animated " style=" color:steelblue;"></i>
:] .pull-left[ | Topic | | :---------------------------------- | | Getting started | | Supervised modeling process | | Feature & target engineering | | Regression & cousins | | Logistic regression | | Regularized regression | | Multivariate adaptive regression splines | ] -- .pull-right[ | Topic | | :----------------------------------------| | K-nearest neighbors | | Decision trees | | Bagging | | Random forests | | Gradient boosting | | Stacked models & auto ML | ] <br> .center.bold[_Plus several portfolio building activities along the way!_] --- # Class material <a href="https://github.com/misk-data-science/misk-homl" class="github-corner" aria-label="View source on Github"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#fff; color:#151513; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style> <br> ### Source code -
<i class="fab fa-github faa-pulse animated-hover "></i> GitHub
: [https://github.com/misk-data-science/misk-homl](https://github.com/misk-data-science/misk-homl) -
<i class="fab fa-slideshare faa-pulse animated-hover "></i> Slides
-
<i class="fas fa-code faa-pulse animated-hover "></i> Student Scripts
-
<i class="fas fa-database faa-pulse animated-hover "></i> Data
--- class: yourturn # Your Turn! <br> ## .font140[Meet your classmates:] .font130[ 1. What is their experience with R and machine learning? 2. What programming experience other than R do they have? 3. How are they using, or how do they plan to use, R and machine learning in their job? ] --- class: misk-section-slide <br><br><br><br><br><br><br> .bold.font250[Prerequisites] --- # Environment This course uses several R 📦 . You should've ran the `00-setup.Rmd` to ensure you have all required packages. .scrollable90[ ```r ############################### # Setting Up Your Environment # ############################### # the following packages will be used list_of_pkgs <- c( "AmesHousing", # provides data we'll use "dslabs", # provides mnist data "tidyverse", # data munging & visualization "reshape2", # data transformation for one example "extracat", # visualizing missing data (one example) "factoextra", # clustering & PCA visualizations "here", # coordinating paths "rsample", # sampling procedures "recipes", # feature engineering procedures "caret", # meta modeling package, "h2o", # meta modeling, model stacking, & auto ML "glmnet", # regularized regression "earth", # multivariate adaptive regression splines "ranger", # fast random forest "gbm", # gradient boosting machines "xgboost", # extreme gradient boosting "broom", # provides model result clean up "vip", # model interpretation "pdp", # model interpretation "plotROC", # plotting ROC curve "rprojroot" # coordinating paths ) # run the following line of code to install the packages you currently do not have new_pkgs <- list_of_pkgs[!(list_of_pkgs %in% installed.packages()[,"Package"])] if(length(new_pkgs)) install.packages(new_pkgs) ``` ] --- # Data .scrollable90[ Ames, IA property sales information (De Cock, 2011) [
<i class="ai ai-google-scholar faa-tada animated-hover "></i>
](https://www.tandfonline.com/doi/pdf/10.1080/10691898.2011.11889627). - .bold[problem type]: supervised regression - .bold[response variable]: sale price (i.e. $195,000, $215,000) - .bold[features]: 80 - .bold[observations]: 2,930 - .bold[objective]: use property attributes to predict the sale price of a home - .bold[access]: provided by the `AmesHousing` package - .bold[more details]: See `?AmesHousing::ames_raw` ```r # access data ames <- AmesHousing::make_ames() # initial dimension dim(ames) ## [1] 2930 81 # response variable head(ames$Sale_Price) ## [1] 215000 105000 172000 244000 189900 195500 # first few observations head(ames) ## # A tibble: 6 x 81 ## MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape Land_Contour Utilities Lot_Config Land_Slope Neighborhood ## <fct> <fct> <dbl> <int> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> ## 1 One_Story_… Resident… 141 31770 Pave No_A… Slightly… Lvl AllPub Corner Gtl North_Ames ## 2 One_Story_… Resident… 80 11622 Pave No_A… Regular Lvl AllPub Inside Gtl North_Ames ## 3 One_Story_… Resident… 81 14267 Pave No_A… Slightly… Lvl AllPub Corner Gtl North_Ames ## 4 One_Story_… Resident… 93 11160 Pave No_A… Regular Lvl AllPub Corner Gtl North_Ames ## 5 Two_Story_… Resident… 74 13830 Pave No_A… Slightly… Lvl AllPub Inside Gtl Gilbert ## 6 Two_Story_… Resident… 78 9978 Pave No_A… Slightly… Lvl AllPub Inside Gtl Gilbert ## # … with 69 more variables: Condition_1 <fct>, Condition_2 <fct>, Bldg_Type <fct>, House_Style <fct>, Overall_Qual <fct>, ## # Overall_Cond <fct>, Year_Built <int>, Year_Remod_Add <int>, Roof_Style <fct>, Roof_Matl <fct>, Exterior_1st <fct>, ## # Exterior_2nd <fct>, Mas_Vnr_Type <fct>, Mas_Vnr_Area <dbl>, Exter_Qual <fct>, Exter_Cond <fct>, Foundation <fct>, ## # Bsmt_Qual <fct>, Bsmt_Cond <fct>, Bsmt_Exposure <fct>, BsmtFin_Type_1 <fct>, BsmtFin_SF_1 <dbl>, BsmtFin_Type_2 <fct>, ## # BsmtFin_SF_2 <dbl>, Bsmt_Unf_SF <dbl>, Total_Bsmt_SF <dbl>, Heating <fct>, Heating_QC <fct>, Central_Air <fct>, Electrical <fct>, ## # First_Flr_SF <int>, Second_Flr_SF <int>, Low_Qual_Fin_SF <int>, Gr_Liv_Area <int>, Bsmt_Full_Bath <dbl>, Bsmt_Half_Bath <dbl>, ## # Full_Bath <int>, Half_Bath <int>, Bedroom_AbvGr <int>, Kitchen_AbvGr <int>, Kitchen_Qual <fct>, TotRms_AbvGrd <int>, ## # Functional <fct>, Fireplaces <int>, Fireplace_Qu <fct>, Garage_Type <fct>, Garage_Finish <fct>, Garage_Cars <dbl>, ## # Garage_Area <dbl>, Garage_Qual <fct>, Garage_Cond <fct>, Paved_Drive <fct>, Wood_Deck_SF <int>, Open_Porch_SF <int>, ## # Enclosed_Porch <int>, Three_season_porch <int>, Screen_Porch <int>, Pool_Area <int>, Pool_QC <fct>, Fence <fct>, ## # Misc_Feature <fct>, Misc_Val <int>, Mo_Sold <int>, Year_Sold <int>, Sale_Type <fct>, Sale_Condition <fct>, Sale_Price <int>, ## # Longitude <dbl>, Latitude <dbl> ``` ] --- class: yourturn # Your Turn! <br><br> .font120[ To get warmed up, let's do some basic exploratory data analysis such as exploratory visualizations or summary statistics with these data sets. The idea is to get a feel for the data. Let's take 5-10 minutes and work with your neighbors. ] --- class: clear, center, middle, hide-logo background-image: url(images/any-questions.jpg) background-position: center background-size: cover --- # Back home <br><br><br><br> [.center[
<i class="fas fa-home fa-10x faa-FALSE animated "></i>
]](https://github.com/misk-data-science/misk-homl) .center[https://github.com/misk-data-science/misk-homl]