class: misk-title-slide <br><br><br><br><br> # .font150[Regression & Cousins] --- # Introduction .pull-left[ .center.bold.font120[Thoughts] - a fundamental analytic method - still widely used - basic approaches have large assumptions - serves as a foundation to many extension methods ] -- .pull-right[ .center.bold.font120[Overview] - Ordinary Least Squares - Principal Component Regression - Partial Least Squares Regression ] --- # Prereqs .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 1] .pull-left[ .center.bold.font120[Packages] ```r library(dplyr) library(ggplot2) library(rsample) library(recipes) library(vip) library(caret) ``` ] .pull-right[ .center.bold.font120[Data] ```r # ames data ames <- AmesHousing::make_ames() # split data set.seed(123) split <- initial_split(ames, strata = "Sale_Price") ames_train <- training(split) ``` ] --- # The Objective <img src="04-linear-regression-slides_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> * Model form: `\(y_i = \beta_0 + \beta_{1}x_{i1} + \beta_{2}x_{i2} \cdots + \beta_{p}x_{ip} + \epsilon_i\)` * Objective function: `\(\text{minimize} \bigg \{ SSE = \sum^n_{i=1} (y_i - \hat{y}_i)^2 \bigg \} \equiv \text{minimize MSE}\)` --- # Simple linear regression .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 2] .pull-left.font120[ - .bold.blue[`lm()`] performs OLS in base R - `glm()` also performs linear regression but extends to other generalized methods (i.e. logistic regression) - `summary(model)` provides many results (i.e. "Residual Standard Error" is the RMSE) - No method for resampling (i.e. cross validation) with `lm()` ] .pull-right[ ```r model1 <- lm(Sale_Price ~ Gr_Liv_Area, data = ames_train) summary(model1) ## ## Call: ## lm(formula = Sale_Price ~ Gr_Liv_Area, data = ames_train) ## ## Residuals: ## Min 1Q Median 3Q Max ## -407824 -30437 -2006 22838 331807 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9929.42 3769.58 2.634 0.0085 ** ## Gr_Liv_Area 114.18 2.39 47.783 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 55950 on 2197 degrees of freedom ## Multiple R-squared: 0.5096, Adjusted R-squared: 0.5094 ## F-statistic: 2283 on 1 and 2197 DF, p-value: < 2.2e-16 ``` ] --- # Multiple linear regression .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 3] .pull-left[ ```r # OLS model with two predictors model2 <- lm(Sale_Price ~ Gr_Liv_Area + Year_Built, data = ames_train) # OLS model with specified interactions model3 <- lm(Sale_Price ~ Gr_Liv_Area + Year_Built + Gr_Liv_Area : Year_Built, data = ames_train) # include all possible main effects model4 <- lm(Sale_Price ~ ., data = ames_train) ``` ] .pull-right[
] --- # Assessing model accuracy .pull-left[ We've fit four models to the Ames housing data: 1. a single predictor, 2. two predictors, 3. two predictors with interaction, 4. and all possible main effect predictors. <br> .center.bold.blue[Which model is "best"?] ] --- # Assessing model accuracy .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 4] .scrollable90[ .pull-left[ We've fit four models to the Ames housing data: 1. a single predictor, 2. two predictors, 3. two predictors with interaction, 4. and all possible main effect predictors. <br> .center.bold.blue[Which model is "best"?] ] .pull-right[ ```r # create a resampling method cv <- trainControl( method = "repeatedcv", number = 10, repeats = 5 ) # model 1 CV set.seed(123) (cv_model1 <- train( Sale_Price ~ Gr_Liv_Area, data = ames_train, * method = "lm", trControl = cv) ) ## Linear Regression ## ## 2199 samples ## 1 predictor ## ## No pre-processing ## Resampling: Cross-Validated (10 fold, repeated 5 times) ## Summary of sample sizes: 1979, 1978, 1980, 1980, 1979, 1980, ... ## Resampling results: ## ## RMSE Rsquared MAE ## 55890.15 0.5134204 38553.52 ## ## Tuning parameter 'intercept' was held constant at a value of TRUE ``` ] ] --- # Assessing model accuracy .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 5] .scrollable90[ .pull-left[ We've fit four models to the Ames housing data: 1. a single predictor, 2. two predictors, 3. two predictors with interaction, 4. and all possible main effect predictors. <br> .center.bold.blue[Model using most predictors is marginally superior] ] .pull-right[ ```r # model 2 CV set.seed(123) cv_model2 <- train( Sale_Price ~ Gr_Liv_Area + Year_Built, data = ames_train, method = "lm", trControl = cv ) # model 3 CV set.seed(123) cv_model3 <- train( Sale_Price ~ Gr_Liv_Area + Year_Built + Gr_Liv_Area : Year_Built, data = ames_train, method = "lm", trControl = cv ) # model 4 CV set.seed(123) cv_model4 <- train( Sale_Price ~ ., data = ames_train, method = "lm", trControl = cv ) # Extract out of sample performance measures summary(resamples(list( model1 = cv_model1, model2 = cv_model2, model3 = cv_model3, model4 = cv_model4 ))) ## ## Call: ## summary.resamples(object = resamples(list(model1 = cv_model1, model2 = cv_model2, model3 ## = cv_model3, model4 = cv_model4))) ## ## Models: model1, model2, model3, model4 ## Number of resamples: 50 ## ## MAE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## model1 34904.18 37559.58 38510.78 38553.52 39715.04 43242.37 0 ## model2 27583.27 30698.46 31629.63 31811.28 32997.49 36053.96 0 ## model3 26380.87 29511.34 30238.31 30571.59 31933.90 34979.59 0 ## model4 14371.94 15392.35 16089.85 16455.34 17307.06 20146.35 0 ## ## RMSE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## model1 47642.64 53673.02 55630.20 55890.15 58676.44 68887.67 0 ## model2 36226.34 43489.06 45281.67 46036.15 48681.86 60533.98 0 ## model3 34530.49 42048.83 43631.66 45186.34 46945.23 64159.82 0 ## model4 20272.41 22556.43 25810.57 28368.21 29749.65 55864.79 0 ## ## Rsquared ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## model1 0.3859977 0.4780213 0.5104276 0.5134204 0.5534873 0.6617992 0 ## model2 0.5431509 0.6500207 0.6783304 0.6703415 0.6948392 0.7485206 0 ## model3 0.4860618 0.6630967 0.6965823 0.6832889 0.7243789 0.7770264 0 ## model4 0.6804975 0.8687873 0.8978879 0.8731645 0.9217784 0.9370737 0 ``` ] ] --- # Model concerns .pull-left[ <br><br><br> .bold.center[With simplistic models comes many assumptions...often at the expense of model performance] ] .pull-right[ <br> <img src="https://media1.tenor.com/images/3c888132eb6fbedec9a131bc55a05315/tenor.gif?itemid=10744949" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. .bold.red[Linear relationship] 2. Constant variance among residuals 3. No autocorrelation 4. More observations than predictors 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this with transformations] ] .pull-right[ <img src="04-linear-regression-slides_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. Linear relationship 2. .bold.red[Constant variance among residuals] 3. No autocorrelation 4. More observations than predictors 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this with transformations or adding more features] ] .pull-right[ <img src="04-linear-regression-slides_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. .bold.red[No autocorrelation] 4. More observations than predictors 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this by adding more features] ] .pull-right[ <img src="04-linear-regression-slides_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- # Model concerns .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. No autocorrelation 4. .bold.red[More observations than predictors] 5. No or little multicollinearity <br> .bold.center[<u>Sometimes</u> we can resolve this with feature reduction techniques] ] .pull-right[ <table class="table table-striped" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> y </th> <th style="text-align:right;"> x1 </th> <th style="text-align:right;"> x2 </th> <th style="text-align:right;"> x3 </th> <th style="text-align:right;"> x4 </th> <th style="text-align:right;"> x5 </th> <th style="text-align:right;"> x6 </th> <th style="text-align:right;"> x7 </th> <th style="text-align:right;"> x8 </th> <th style="text-align:right;"> x9 </th> <th style="text-align:right;"> x10 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 129002 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 371454 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 389726 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 309319 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 258595 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 5 </td> </tr> </tbody> </table> .bold.center.red[Not invertible --> solutions are non-unique meaning there are many "right" solutions for our feature coefficients!] ] --- # Model concerns .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 6] .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. No autocorrelation 4. More observations than predictors 5. .bold.red[No or little multicollinearity] <br> .bold.center[<u>Sometimes</u> we can resolve this with feature reduction techniques] ] .pull-right[ ```r m1 <- lm(Sale_Price ~ Gr_Liv_Area + TotRms_AbvGrd, data = ames_train) m2 <- lm(Sale_Price ~ Gr_Liv_Area, data = ames_train) m3 <- lm(Sale_Price ~ TotRms_AbvGrd, data = ames_train) *coef(m1) ## (Intercept) Gr_Liv_Area TotRms_AbvGrd ## 38967.4910 142.1163 -11040.8135 *coef(m2) ## (Intercept) Gr_Liv_Area ## 9929.4188 114.1796 *coef(m3) ## (Intercept) TotRms_AbvGrd ## 16212.42 25649.47 ``` ] --- # Model concerns .pull-left[ 1. Linear relationship 2. Constant variance among residuals 3. No autocorrelation 4. More observations than predictors 5. No or little multicollinearity ] .pull-right[ <img src="http://tinderdistrict.com/wp-content/uploads/2018/06/complicated.gif" style="display: block; margin: auto;" /> ] -- <br><br> .bold.center[Many regression extensions have been developed to deal with these concerns.] --- class: misk-section-slide <br><br><br><br><br><br><br> .bold.font250[Principal Component Regression] --- # The idea .pull-left[ PCR performs feature reduction to help minimize impact of: - multicollinearity (becomes a bigger concern the more predictors we have) - when `\(p >> n\)` Steps: 1. Reduce *p* features to *c* PCs (not guided by the response) 2. Use PCs as predictors and perform regression as usual ] .pull-right[ <img src="images/pcr-steps.png" width="86%" height="86%" style="display: block; margin: auto;" /> ] --- # R packages 📦 <br> .font130[ - Any package that implements PCA can be applied prior to modeling, - See [multivariate task view]( https://CRAN.R-project.org/view=Multivariate ) on CRAN for options; however,... - .bold[caret] provides and integrated `method = "pcr"` that helps to automate the tuning process ] --- # Implementation .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 7] .pull-left[ ```r # 1. hypergrid hyper_grid <- expand.grid(ncomp = seq(2, 40, by = 2)) # 2. PCR set.seed(123) cv_pcr <- train( Sale_Price ~ ., data = ames_train, trControl = cv, * method = "pcr", * preProcess = c("zv", "center", "scale"), * tuneGrid = hyper_grid, metric = "RMSE" ) # model with lowest RMSE cv_pcr$bestTune ## ncomp ## 20 40 cv_pcr$results %>% filter(ncomp == as.numeric(cv_pcr$bestTune)) ## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 40 32213.17 0.8379086 21118.75 5805.984 0.04868926 1462.987 ``` ] .pull-right[ ```r # plot cross-validated RMSE plot(cv_pcr) ``` <img src="04-linear-regression-slides_files/figure-html/pcr-plot-revised-1.png" style="display: block; margin: auto;" /> .center.bold[Feature reduction with PCR improves prediction error by ~ $10K] ] --- # Tuning .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 8] .scrollable90[ .pull-left[ - The number of PCs is the only hyperparameter - rule of
<img src="https://emojis.slackmojis.com/emojis/images/1511903783/3230/wiggle_thumbs_up.gif?1511903783" style="height:1em; width:auto; "/>
- assess 2-*p* in evenly divided segments - start with a few and zoom in ] .pull-right[ ```r # 1. hypergrid p <- length(ames_train) - 1 *hyper_grid <- expand.grid(ncomp = seq(2, 80, length.out = 10)) # 2. PCR set.seed(123) cv_pcr <- train( Sale_Price ~ ., data = ames_train, trControl = cv, method = "pcr", preProcess = c("zv", "center", "scale"), tuneGrid = hyper_grid, metric = "RMSE" ) # RMSE cv_pcr$results %>% filter(ncomp == cv_pcr$bestTune$ncomp) ## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 80 31498.8 0.8450051 20308.17 5942.775 0.0507097 1304.71 # plot cross-validated RMSE plot(cv_pcr) ``` <img src="04-linear-regression-slides_files/figure-html/pcr-grid-2-1.png" style="display: block; margin: auto;" /> ] ] --- class: misk-section-slide <br><br><br><br><br><br><br> .bold.font250[Partial Least Squares Regression] --- # The idea .pull-left[ - A problem with PCR is that the PCs are developed independent of the response. - PLS - has similar intentions as PCR - finds PCs that maximize correlation with the response - typically results in a stronger signal between PCs and response ] .pull-right[ <img src="images/pls-steps.png" width="94%" height="94%" style="display: block; margin: auto;" /> ] --- # The idea .pull-left[ - A problem with PCR is that the PCs are developed independent of the response. - PLS - has similar intentions as PCR - finds PCs that maximize correlation with the response - .bold.blue[typically results in a stronger signal between PCs and response] ] .pull-right[ <img src="04-linear-regression-slides_files/figure-html/pls-vs-pcr-relationship-1.png" style="display: block; margin: auto;" /> ] --- # R packages 📦 .pull-left[ ## [`pls`](https://cran.r-project.org/package=pls) * **p**artial **l**east **s**quares * Original and primary implementation of PLS * Provides both PLS & PCR capabilities ] .pull-right[ ## [Other pkgs](https://CRAN.R-project.org/view=Multivariate) * `ppls`: penalized partial least squares * `dr`: provides various dimension reduction regression options * `plsgenomics`: provides partial least squares analyses for genomics ] --- # Implementation .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 9] .pull-left[ ```r # PLS set.seed(123) cv_pls <- train( Sale_Price ~ ., data = ames_train, trControl = cv, * method = "pls", preProcess = c("zv", "center", "scale"), tuneGrid = hyper_grid, metric = "RMSE" ) # model with lowest RMSE cv_pls$bestTune ## ncomp ## 8 62.66667 cv_pls$results %>% filter(ncomp == as.numeric(cv_pls$bestTune)) ## ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 62.66667 28142.7 0.8750775 16323.37 8060.319 0.0664182 1427.488 ``` ] .pull-right[ ```r # plot cross-validated RMSE plot(cv_pls) ``` <img src="04-linear-regression-slides_files/figure-html/pls-plot-1.png" style="display: block; margin: auto;" /> .center.bold[Using PLS improves prediction error by an additional $500] ] --- # Tuning - The number of PCs is the only hyperparameter - Will almost always require less PCs than PCR - rule of
<img src="https://emojis.slackmojis.com/emojis/images/1511903783/3230/wiggle_thumbs_up.gif?1511903783" style="height:1em; width:auto; "/>
- assess 2-*p* in evenly divided segments - start with a few and zoom in --- class: misk-section-slide <br><br><br><br><br><br><br> .bold.font250[Model Comparison] --- # Comparing error distributions .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 10] .pull-left[ ```r results <- resamples(list( OLS = cv_model4, PCR = cv_pcr, PLS = cv_pls )) summary(results)$statistics$RMSE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## OLS 20272.41 22556.43 25810.57 28368.21 29749.65 55864.79 0 ## PCR 24562.41 26892.00 29479.23 31498.80 34003.86 49406.53 0 ## PLS 20008.20 22404.48 25721.42 28142.70 30311.02 55722.45 0 ``` ] .pull-right[ ```r p1 <- bwplot(results, metric = "RMSE") p2 <- dotplot(results, metric = "RMSE") gridExtra::grid.arrange(p1, p2, nrow = 1) ``` <img src="04-linear-regression-slides_files/figure-html/compare-bwplot-1.png" style="display: block; margin: auto;" /> ] .center.bold[Student's *t*-test or a rank sum test could also be used.] --- class: misk-section-slide <br><br><br><br><br><br><br> .bold.font250[Feature Interpretation] --- # Feature importance .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 11] .pull-left[ .center.bold.font120[vip] * .bold[v]ariable .bold[i]mportance .bold[p]lots illustrate the influence each predictor has * many packages have their own vip plots * the __vip__ 📦 provides a common output * different models measure "importance" differently * we'll review this more indepth in later modules ] .pull-right[ ```r vip(cv_pls) ``` <img src="04-linear-regression-slides_files/figure-html/pls-vip-1.png" style="display: block; margin: auto;" /> ] --- # Feature importance .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 12] <img src="04-linear-regression-slides_files/figure-html/all-vip-1.png" style="display: block; margin: auto;" /> --- # Feature effects .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 13] .pull-left[ * feature effects measures the relationship between a feature and the target variable * most common approach is a .bold[p]artial .bold[d]ependence .bold[p]lot * computs the average response value when all observations use a particular value for a given feature ] .pull-right[ ```r pdp::partial(cv_model2, pred.var = "Gr_Liv_Area", grid.resolution = 10) %>% autoplot() ``` <img src="04-linear-regression-slides_files/figure-html/ols-pdp-1.png" style="display: block; margin: auto;" /> ] --- # Feature effects .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 14] <br><br> <img src="04-linear-regression-slides_files/figure-html/all-pdps, -1.png" style="display: block; margin: auto;" /> --- # Feature effects .red[
<i class="fas fa-hand-point-right faa-horizontal animated " style=" color:red;"></i>
code chunk 15] Assess the interaction of 2 predictors: ```r pdp::partial(cv_model2, pred.var = c("Gr_Liv_Area", "Year_Built"), grid.resolution = 10) %>% pdp::plotPartial(levelplot = FALSE, zlab = "yhat", drape = TRUE, colorkey = TRUE, screen = list(z = -20, x = -60)) ``` <img src="04-linear-regression-slides_files/figure-html/interaction-pdp-1.png" style="display: block; margin: auto;" /> --- class: misk-section-slide <br><br><br><br><br><br><br> .bold.font250[Wrapping up] --- # Summary * Ordinary least squares - simple but lots of assumptions - typically poor predictive accuracy * Principal Component Regression - minimizes multicollinearity - helps when `\(p >> n\)` * Partial Least Squares - same benefits as PCR but - creates stronger signal btwn PCs and target --- class: clear, center, middle, hide-logo background-image: url(images/any-questions.jpg) background-position: center background-size: cover --- # Back home <br><br><br><br> [.center[
<i class="fas fa-home fa-10x faa-FALSE animated "></i>
]](https://github.com/misk-data-science/misk-homl) .center[https://github.com/misk-data-science/misk-homl]