8 Programming with the Tidyverse

8.1 Learning Objectives

How to write custom functions that are flexible and take advantage of non-standard evaluation prevalent in the tidyverse.

8.2 Introduction

In the last chapter we covered the fundamentals of writing custom functions. In the exercises we also a few variations on writing functions. Let’s review exercise 7.8 before moving on. In the task, you were given yy, an object of class lm, type list:

# yy is an object of class lm
yy <- lm(mpg ~ wt, data = mtcars)
class(yy)
#> [1] "lm"
typeof(yy)
#> [1] "list"

You were tasked with writing a function, assigning yy to the generic argument x, that returns sum(x$residuals^2)⁵. The solution is should be familiar to you by now.


# Define the function =====
# _1 uses dollar notation
calc_SSR_1 <- function(x) {
  sum(x$residuals^2)
}

# Test the function
calc_SSR_1(yy)
#> [1] 278.3219

Let’s explore this function in more depth with another exercise.

Exercise 8.1 Recall yy, the lm class object we defined above. This object is a type list comprised of 12 named elements. Define a new function: lm_extract(). This function will take two arguments. First, x will take an lm class object. Second, item will take a 1-element long character vector which specifies which named element to return.

The valid names are:

# The names of each element 
names(yy)
#>  [1] "coefficients"  "residuals"     "effects"      
#>  [4] "rank"          "fitted.values" "assign"       
#>  [7] "qr"            "df.residual"   "xlevels"      
#> [10] "call"          "terms"         "model"

As an example, the output of your function should be:

lm_extract(x = yy, item = "fitted.values")

We’ll review the solution in our first group session. Here’s a hint: Recall that we can access a named element using either $ notation or indexing with [] and for lists we can also use [[]].

Before we move on, let’s try one more trick to showcase how R looks for an object.

Exercise 8.2 Consider lm_extract(x = yy, item = "fitted.values") from above. Is it possible to only specify item and not x? Indeed, it is possible and there are many different solutions. Try to solve this exercise on your own before looking at the solutions given below.

The function call should be:

lm_extract("fitted.values")

Make a solid attempt to develop your own solution before looking at the ones below.

One solution could be to define the object yy inside the function definition for lm_extract().

lm_extract_internal <- function(item) {
  lm(mpg ~ wt, data = mtcars)[[item]]
}

lm_extract_internal("fitted.values")
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710 
#>           23.282611           21.919770           24.885952 
#>      Hornet 4 Drive   Hornet Sportabout             Valiant 
#>           20.102650           18.900144           18.793255 
#>          Duster 360           Merc 240D            Merc 230 
#>           18.205363           20.236262           20.450041 
#>            Merc 280           Merc 280C          Merc 450SE 
#>           18.900144           18.900144           15.533127 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
#>           17.350247           17.083024            9.226650 
#> Lincoln Continental   Chrysler Imperial            Fiat 128 
#>            8.296712            8.718926           25.527289 
#>         Honda Civic      Toyota Corolla       Toyota Corona 
#>           28.653805           27.478021           24.111004 
#>    Dodge Challenger         AMC Javelin          Camaro Z28 
#>           18.472586           18.926866           16.762355 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
#>           16.735633           26.943574           25.847957 
#>        Lotus Europa      Ford Pantera L        Ferrari Dino 
#>           29.198941           20.343151           22.480940 
#>       Maserati Bora          Volvo 142E 
#>           18.205363           22.427495

This could be a solution, but it should be used with caution. That’s because we’ve hard-coded yy, the lm class object. We didn’t even provide a way to assign a different data set.

Notice that I don’t need to create any new objects inside the function definition. After equating lm() we worked directly with the students the object, e.g. [[item]] without an intermediate step. But where will the dataframe mtcars come from? In this case we’re kind of cheating since its a builtin object and we’ll always have access to it. But what if it was an object we made and only exists in our Global Environment, i.e. the workspace where user-defined objects typically exist. Let’s see that in action by making a copy of mtcars, called mtcars_global, which only exists in the Global Enviornment.

mtcars_global <- mtcars

lm_extract_global <- function(item) {
  lm(mpg ~ wt, data = mtcars_global)[[item]]
}
lm_extract_global("fitted.values")
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710 
#>           23.282611           21.919770           24.885952 
#>      Hornet 4 Drive   Hornet Sportabout             Valiant 
#>           20.102650           18.900144           18.793255 
#>          Duster 360           Merc 240D            Merc 230 
#>           18.205363           20.236262           20.450041 
#>            Merc 280           Merc 280C          Merc 450SE 
#>           18.900144           18.900144           15.533127 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood 
#>           17.350247           17.083024            9.226650 
#> Lincoln Continental   Chrysler Imperial            Fiat 128 
#>            8.296712            8.718926           25.527289 
#>         Honda Civic      Toyota Corolla       Toyota Corona 
#>           28.653805           27.478021           24.111004 
#>    Dodge Challenger         AMC Javelin          Camaro Z28 
#>           18.472586           18.926866           16.762355 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2 
#>           16.735633           26.943574           25.847957 
#>        Lotus Europa      Ford Pantera L        Ferrari Dino 
#>           29.198941           20.343151           22.480940 
#>       Maserati Bora          Volvo 142E 
#>           18.205363           22.427495

It still works! Can you explain why?

The reason has to do with scoping. To understand this, we should begin by understanding the many environments in R. When you start a new R session and subsequently attach packages by using the library() function, you create a chain of nested environments. The last environment is the Global Environment, the workspace where you custom user-defined objects typically exist.

We also call (i.e. execute, or run) functions inside the Global Environment, either user-defined, like lm_extract_global(), or any variety of those provided by the attached packages or base R. When a function is called from the Global Environment, it creates an additional, further nested environment at the end of, and isolated from, the rest of the environment chain. The function operates first with the objects it has in its own environment. If an object is needed but not present, as with mtcars_global, then the function searches for in its parent environment, which will be the Global Environment. If it’s not there, it will keep going all that way back up to the base R. This is exactly what happens when we called mtcars from the Global Environment. That object is from base R, as are many function that we call, but we find it eventually, but going back through the chain of nested environments.

8.3 dplyr and NSE, Non-Standard Evaluation

There are some problems when using tidyverse within custom functions, in particular because dplyr uses non-standard evaluation (NSE) syntax inside functions. This allows it to delay execution, which is very handy, but can cause some issues. Let’s consider some examples.

This is completely fine use in interactive mode:

library(dplyr)

mtcars %>% 
  as_tibble() %>% 
  filter(cyl == 8)
#> # A tibble: 14 × 17
#>      mpg cyl    disp    hp  drat    wt  qsec    vs am   
#>    <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#>  1  18.7 8      360    175  3.15  3.44  17.0     0 0    
#>  2  14.3 8      360    245  3.21  3.57  15.8     0 0    
#>  3  16.4 8      276.   180  3.07  4.07  17.4     0 0    
#>  4  17.3 8      276.   180  3.07  3.73  17.6     0 0    
#>  5  15.2 8      276.   180  3.07  3.78  18       0 0    
#>  6  10.4 8      472    205  2.93  5.25  18.0     0 0    
#>  7  10.4 8      460    215  3     5.42  17.8     0 0    
#>  8  14.7 8      440    230  3.23  5.34  17.4     0 0    
#>  9  15.5 8      318    150  2.76  3.52  16.9     0 0    
#> 10  15.2 8      304    150  3.15  3.44  17.3     0 0    
#> 11  13.3 8      350    245  3.73  3.84  15.4     0 0    
#> 12  19.2 8      400    175  3.08  3.84  17.0     0 0    
#> 13  15.8 8      351    264  4.22  3.17  14.5     0 1    
#> 14  15   8      301    335  3.54  3.57  14.6     0 1    
#> # … with 8 more variables: gear <dbl>, carb <dbl>,
#> #   cyl_f <fct>, vs_f <fct>, am_f <fct>, gear_f <fct>,
#> #   carb_f <fct>, car <chr>

But inside a function, it doesn’t work. Can you see why not?

getCyl <- function(cyl) {
  mtcars %>% 
    as_tibble() %>% 
    filter(cyl == cyl)
}

getCyl(8)
#> # A tibble: 32 × 17
#>      mpg cyl    disp    hp  drat    wt  qsec    vs am   
#>    <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#>  1  21   6      160    110  3.9   2.62  16.5     0 1    
#>  2  21   6      160    110  3.9   2.88  17.0     0 1    
#>  3  22.8 4      108     93  3.85  2.32  18.6     1 1    
#>  4  21.4 6      258    110  3.08  3.22  19.4     1 0    
#>  5  18.7 8      360    175  3.15  3.44  17.0     0 0    
#>  6  18.1 6      225    105  2.76  3.46  20.2     1 0    
#>  7  14.3 8      360    245  3.21  3.57  15.8     0 0    
#>  8  24.4 4      147.    62  3.69  3.19  20       1 0    
#>  9  22.8 4      141.    95  3.92  3.15  22.9     1 0    
#> 10  19.2 6      168.   123  3.92  3.44  18.3     1 0    
#> # … with 22 more rows, and 8 more variables: gear <dbl>,
#> #   carb <dbl>, cyl_f <fct>, vs_f <fct>, am_f <fct>,
#> #   gear_f <fct>, carb_f <fct>, car <chr>

How about this?

getMetric <- function(metric, value) {
  mtcars %>% 
    as_tibble() %>% 
    filter(metric == value)
}

getMetric("cyl", 8)
#> # A tibble: 0 × 17
#> # … with 17 variables: mpg <dbl>, cyl <fct>, disp <dbl>,
#> #   hp <dbl>, drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>,
#> #   am <fct>, gear <dbl>, carb <dbl>, cyl_f <fct>,
#> #   vs_f <fct>, am_f <fct>, gear_f <fct>, carb_f <fct>,
#> #   car <chr>

In base R, we would use this:

getMetric <- function(metric, value) {
  mtcars[mtcars[[metric]] == value,]
}

getMetric("cyl", 8)
#>                      mpg cyl  disp  hp drat    wt  qsec vs
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0
#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0
#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0
#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0
#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0
#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0
#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0
#>                     am gear carb cyl_f vs_f am_f gear_f
#> Hornet Sportabout    0    3    2     8    0    0      3
#> Duster 360           0    3    4     8    0    0      3
#> Merc 450SE           0    3    3     8    0    0      3
#> Merc 450SL           0    3    3     8    0    0      3
#> Merc 450SLC          0    3    3     8    0    0      3
#> Cadillac Fleetwood   0    3    4     8    0    0      3
#> Lincoln Continental  0    3    4     8    0    0      3
#> Chrysler Imperial    0    3    4     8    0    0      3
#> Dodge Challenger     0    3    2     8    0    0      3
#> AMC Javelin          0    3    2     8    0    0      3
#> Camaro Z28           0    3    4     8    0    0      3
#> Pontiac Firebird     0    3    2     8    0    0      3
#> Ford Pantera L       1    5    4     8    0    1      5
#> Maserati Bora        1    5    8     8    0    1      5
#>                     carb_f                 car
#> Hornet Sportabout        2   Hornet Sportabout
#> Duster 360               4          Duster 360
#> Merc 450SE               3          Merc 450SE
#> Merc 450SL               3          Merc 450SL
#> Merc 450SLC              3         Merc 450SLC
#> Cadillac Fleetwood       4  Cadillac Fleetwood
#> Lincoln Continental      4 Lincoln Continental
#> Chrysler Imperial        4   Chrysler Imperial
#> Dodge Challenger         2    Dodge Challenger
#> AMC Javelin              2         AMC Javelin
#> Camaro Z28               4          Camaro Z28
#> Pontiac Firebird         2    Pontiac Firebird
#> Ford Pantera L           4      Ford Pantera L
#> Maserati Bora            8       Maserati Bora

However, if you set the data, this works:

getMetric <- function(df, value) {
  df %>% 
    as_tibble() %>% 
    filter(cyl == value)
}

getMetric(mtcars, 8)
#> # A tibble: 14 × 17
#>      mpg cyl    disp    hp  drat    wt  qsec    vs am   
#>    <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#>  1  18.7 8      360    175  3.15  3.44  17.0     0 0    
#>  2  14.3 8      360    245  3.21  3.57  15.8     0 0    
#>  3  16.4 8      276.   180  3.07  4.07  17.4     0 0    
#>  4  17.3 8      276.   180  3.07  3.73  17.6     0 0    
#>  5  15.2 8      276.   180  3.07  3.78  18       0 0    
#>  6  10.4 8      472    205  2.93  5.25  18.0     0 0    
#>  7  10.4 8      460    215  3     5.42  17.8     0 0    
#>  8  14.7 8      440    230  3.23  5.34  17.4     0 0    
#>  9  15.5 8      318    150  2.76  3.52  16.9     0 0    
#> 10  15.2 8      304    150  3.15  3.44  17.3     0 0    
#> 11  13.3 8      350    245  3.73  3.84  15.4     0 0    
#> 12  19.2 8      400    175  3.08  3.84  17.0     0 0    
#> 13  15.8 8      351    264  4.22  3.17  14.5     0 1    
#> 14  15   8      301    335  3.54  3.57  14.6     0 1    
#> # … with 8 more variables: gear <dbl>, carb <dbl>,
#> #   cyl_f <fct>, vs_f <fct>, am_f <fct>, gear_f <fct>,
#> #   carb_f <fct>, car <chr>

There is another problem. If a column name exists as a global variable, and not in the data frame itself, then it will silently fail.

# An object in the global environment
cyl <- 1

# mtcars, excluding the cyl variable
mtcars_noCyl <- mtcars %>% 
  select(-cyl)

getMetric <- function(df, value) {
  df %>% 
    as_tibble() %>% 
    filter(cyl == value)
}

# This silent failure returns a value, not an error!
getMetric(mtcars_noCyl, 8)
#> # A tibble: 0 × 16
#> # … with 16 variables: mpg <dbl>, disp <dbl>, hp <dbl>,
#> #   drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <fct>,
#> #   gear <dbl>, carb <dbl>, cyl_f <fct>, vs_f <fct>,
#> #   am_f <fct>, gear_f <fct>, carb_f <fct>, car <chr>

To avoid this, we need to use the .data pronoun to explicitly refer to the data set at hand.


getMetric <- function(df, value) {
  df %>% 
    as_tibble() %>% 
    filter(.data$cyl == value)
}

getMetric(mtcars_noCyl, 8)

This results in the following error as we expect it should:

Error: Problem with `filter()` input `..1`. ℹ Input `..1` is `.data$cyl == value`. x Column `cyl` not found in `.data` Run `rlang::last_error()` to see where the error occurred.

If we continue with our analogy of dplyr as the grammar of data analysis, we can consider .data as a pronoun. Just like pronouns in regular language, .data refers to an object inside a data frame, but we make is explicit rather than implicit, which is a pretty good idea, which we’ll see below. Using .data is the best practice when using dplyr in functions and is required when writing packages. Know also that .data is an internal object in the dplyr grammar, it has nothing to do with the actual name of the data frame, even though we assign our data frame to the argument data in our functions.

To appreciate the importance of the .data pronoun, consider that in this example, we refer to mtcars$cyl implicitly.

# Implicit reference to data frame (mtcars):
mtcars %>% 
  group_by(cyl) %>% 
  summarise(avg = mean(mpg))
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

We can do that in a function and it stills works fine:

# A problem if this object exists in the
# global environment but not in the data frame
mpg <- 1:4

# Implicit reference to data frame (mtcars):
calcSummary <- function(data) {
  data %>% 
    group_by(cyl) %>% 
    summarise(avg = mean(mpg))

}

calcSummary(mtcars)
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

But we would prefer to make it explicit:

# Explicit reference to data:
calcSummary <- function(data) {
  data %>% 
    group_by(cyl) %>% 
    summarise(avg = mean(.data$mpg))
}

calcSummary(mtcars)
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

We can also refer to column names stored as strings, e.g.:

# Explicit reference using a string is not possible with $ notation
mtcars %>% 
  group_by(cyl) %>% 
  summarise(avg = mean(.data$cyl))
#> Warning in mean.default(.data$cyl): argument is not numeric
#> or logical: returning NA

#> Warning in mean.default(.data$cyl): argument is not numeric
#> or logical: returning NA

#> Warning in mean.default(.data$cyl): argument is not numeric
#> or logical: returning NA
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4        NA
#> 2 6        NA
#> 3 8        NA

# Explicit reference using a string works with [] and [[]]
myVar <- "mpg"

mtcars %>% 
  group_by(cyl) %>% 
  summarise(avg = mean(.data[[myVar]]))
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

Here, myVar is unquoted, meaning that a column named myVar in the data frame cannot mask it. Thus, we can use the .data pronoun safely in functions and packages. .data also respects grouped data frame attributes.

8.4 Quoting, unquoting and quosures

Exercise 8.3 To understand how to access column variables as variable names themselves and not character strings, consider the following function definitoin and two function calls. Which one will execute the function correctly and produce a plot?

myScatter <- function(.data, x, y, z = NULL) {
  ggplot(.data) + 
    geom_point(aes(x, y)) +
    theme_classic()
}

# Function Call 1:
myScatter(mtcars, wt, mpg) 

# Function Call 2:
myScatter(mtcars, "wt", "mpg")

The answer is neither one! Can you explain why?

We wouldn’t usually write the name of a variable with quotation marks when executing the plotting functions in interactive mode. Although we may want to do that in our custom function, it’s not typical and we may rather have the actual variable names instead. Let’s take a look at both in turn.

For function call 1, we’ll show a detailed solution to give you context, and then a shorter, easier version which will be preferable. Let’s begin with the detailed version.

The reason that putting values into an expression in dplyr doesn’t work is that those values are “quotes.” To get around this we can use quosures A quosure is a special type of formula obtained using the quo() function. It contains a quoted expression and information about the environment where it came from, i.e. the scope. R is lexically scoped, meaning that functions carry with them a reference to the environment within which they were defined.

Here, we make a quosure with the enquo() function. the en part tells us that this is for expressions user will make, e.g. as input to a function.

# To properly use function call 1, above:
myScatter <- function(.data, x, y, z = NULL) {
  x <- enquo(x)
  y <- enquo(y)

  ggplot(.data) + 
    geom_point(aes(!!x, !!y)) +
    theme_classic()
}

# Function Call 1:
myScatter(mtcars, wt, mpg)

This process has been simplified by introducing the double curley brackets {{}}, which embraces the variable

# Use embracing to simplify the function and properly use function call 1, above:
myScatter <- function(.data, x, y, z = NULL) {
  ggplot(.data) + 
    geom_point(aes({{x}}, {{y}})) +
    theme_classic()
}

# Function Call 1:
myScatter(mtcars, wt, mpg)

Alternatively, we could have used standard evaluation by providing a string. One solution would be to use aes_string() instead of aes(), which expects input as a string:

# To properly use function call 2:
myScatter <- function(.data, x, y, z = NULL) {
  ggplot(.data) + 
    geom_point(aes_string(x, y)) +
    theme_classic()
}

# Function Call 2:
myScatter(mtcars, "wt", "mpg")

However, this is a short-cut to the more explicit and flexible use of calling the name of the variable using [[]] (or []) notation as we have seen earlier:

# To properly use function call 2:
myScatter <- function(.data, x, y, z = NULL) {
  ggplot(.data) + 
    geom_point(aes(.data[[x]], .data[[y]])) +
    theme_classic()
}

# Function Call 2:
myScatter(mtcars, "wt", "mpg")

Using a character vector to refer to a variable is also an issue in interactive mode. Here, the object var refers to the data-variable, but this doesn’t work:

var <- "mpg"

# We force `var`, which substitutes it with `mpg`
mtcars %>%
  group_by(cyl) %>% 
  summarise(avg = mean(!!var))
#> Warning in mean.default("mpg"): argument is not numeric or
#> logical: returning NA

#> Warning in mean.default("mpg"): argument is not numeric or
#> logical: returning NA

#> Warning in mean.default("mpg"): argument is not numeric or
#> logical: returning NA
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4        NA
#> 2 6        NA
#> 3 8        NA

So we need to quote it using sym() and unquote it using !!

myVar <- sym("mpg")

# We force `var`, which substitutes it with `mpg`
mtcars %>%
  group_by(cyl) %>% 
  summarise(avg = mean(!!myVar))
#> # A tibble: 3 × 2
#>   cyl     avg
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

we need both !! and sym(). !! unquotes the symbol or call that follows (say “unquote” or “bang bang”). We can see that here:

a <- 10
b <- 90

expr(log10(!!a + b)) 
#> log10(10 + b)
expr(log(!!(a + b)))
#> log(100)
log10(a + b)
#> [1] 2

We are using numbers, as in values in a variable, we don’t need to use sym():


myQuery1 <- 8
myQuery2 <- 1

mtcars %>%
  filter(cyl == !!myQuery1, am == !!myQuery2)
#>                 mpg cyl disp  hp drat   wt qsec vs am gear
#> Ford Pantera L 15.8   8  351 264 4.22 3.17 14.5  0  1    5
#> Maserati Bora  15.0   8  301 335 3.54 3.57 14.6  0  1    5
#>                carb cyl_f vs_f am_f gear_f carb_f
#> Ford Pantera L    4     8    0    1      5      4
#> Maserati Bora     8     8    0    1      5      8
#>                           car
#> Ford Pantera L Ford Pantera L
#> Maserati Bora   Maserati Bora

To make things more confusing, you may also encounter !!! which unquotes a vector or list and splices the results as arguments into the surrounding call. Here 8 and base = 2 are the arguments for log(). Pronounce this as “unquote splice” or “bang-bang-bang.” The !!! notation is also used for triple negation, to emphasize that a logical vector has been negated, but you can see how this may cause some confusion now!


input1 <- 8
input2 <- 2
  
x <- list(input1, base = input2)

myExpression <- expr(log(!!!x)) 

myExpression
#> log(8, base = 2)

eval(myExpression)
#> [1] 3

We can see this in action here.

mtcars %>% 
  select(cyl, mpg)
#>                     cyl  mpg
#> Mazda RX4             6 21.0
#> Mazda RX4 Wag         6 21.0
#> Datsun 710            4 22.8
#> Hornet 4 Drive        6 21.4
#> Hornet Sportabout     8 18.7
#> Valiant               6 18.1
#> Duster 360            8 14.3
#> Merc 240D             4 24.4
#> Merc 230              4 22.8
#> Merc 280              6 19.2
#> Merc 280C             6 17.8
#> Merc 450SE            8 16.4
#> Merc 450SL            8 17.3
#> Merc 450SLC           8 15.2
#> Cadillac Fleetwood    8 10.4
#> Lincoln Continental   8 10.4
#> Chrysler Imperial     8 14.7
#> Fiat 128              4 32.4
#> Honda Civic           4 30.4
#> Toyota Corolla        4 33.9
#> Toyota Corona         4 21.5
#> Dodge Challenger      8 15.5
#> AMC Javelin           8 15.2
#> Camaro Z28            8 13.3
#> Pontiac Firebird      8 19.2
#> Fiat X1-9             4 27.3
#> Porsche 914-2         4 26.0
#> Lotus Europa          4 30.4
#> Ford Pantera L        8 15.8
#> Ferrari Dino          6 19.7
#> Maserati Bora         8 15.0
#> Volvo 142E            4 21.4

vars <- c("cyl", "mpg")
mtcars %>% 
  select(vars)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(vars)` instead of `vars` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#>                     cyl  mpg
#> Mazda RX4             6 21.0
#> Mazda RX4 Wag         6 21.0
#> Datsun 710            4 22.8
#> Hornet 4 Drive        6 21.4
#> Hornet Sportabout     8 18.7
#> Valiant               6 18.1
#> Duster 360            8 14.3
#> Merc 240D             4 24.4
#> Merc 230              4 22.8
#> Merc 280              6 19.2
#> Merc 280C             6 17.8
#> Merc 450SE            8 16.4
#> Merc 450SL            8 17.3
#> Merc 450SLC           8 15.2
#> Cadillac Fleetwood    8 10.4
#> Lincoln Continental   8 10.4
#> Chrysler Imperial     8 14.7
#> Fiat 128              4 32.4
#> Honda Civic           4 30.4
#> Toyota Corolla        4 33.9
#> Toyota Corona         4 21.5
#> Dodge Challenger      8 15.5
#> AMC Javelin           8 15.2
#> Camaro Z28            8 13.3
#> Pontiac Firebird      8 19.2
#> Fiat X1-9             4 27.3
#> Porsche 914-2         4 26.0
#> Lotus Europa          4 30.4
#> Ford Pantera L        8 15.8
#> Ferrari Dino          6 19.7
#> Maserati Bora         8 15.0
#> Volvo 142E            4 21.4


vars <- syms(c("cyl", "mpg"))

mtcars %>% select(!!!vars)
#>                     cyl  mpg
#> Mazda RX4             6 21.0
#> Mazda RX4 Wag         6 21.0
#> Datsun 710            4 22.8
#> Hornet 4 Drive        6 21.4
#> Hornet Sportabout     8 18.7
#> Valiant               6 18.1
#> Duster 360            8 14.3
#> Merc 240D             4 24.4
#> Merc 230              4 22.8
#> Merc 280              6 19.2
#> Merc 280C             6 17.8
#> Merc 450SE            8 16.4
#> Merc 450SL            8 17.3
#> Merc 450SLC           8 15.2
#> Cadillac Fleetwood    8 10.4
#> Lincoln Continental   8 10.4
#> Chrysler Imperial     8 14.7
#> Fiat 128              4 32.4
#> Honda Civic           4 30.4
#> Toyota Corolla        4 33.9
#> Toyota Corona         4 21.5
#> Dodge Challenger      8 15.5
#> AMC Javelin           8 15.2
#> Camaro Z28            8 13.3
#> Pontiac Firebird      8 19.2
#> Fiat X1-9             4 27.3
#> Porsche 914-2         4 26.0
#> Lotus Europa          4 30.4
#> Ford Pantera L        8 15.8
#> Ferrari Dino          6 19.7
#> Maserati Bora         8 15.0
#> Volvo 142E            4 21.4

Exercise 8.4 Can you complete this the following function call:

getMean <- function(data, group_var, summary_var) {

  
}

The following commands should work:

getMean(mtcars, cyl, mpg)
#> # A tibble: 3 × 2
#>   cyl    mean
#>   <fct> <dbl>
#> 1 4      26.7
#> 2 6      19.7
#> 3 8      15.1

getMean(mtcars, am, hp)
#> # A tibble: 2 × 2
#>   am     mean
#>   <fct> <dbl>
#> 1 0      160.
#> 2 1      127.

getMean(iris, Species, Petal.Length)
#> # A tibble: 3 × 2
#>   Species     mean
#>   <fct>      <dbl>
#> 1 setosa      1.46
#> 2 versicolor  4.26
#> 3 virginica   5.55

Exercise 8.5 Now you’re ready to return to the Project Challenge you selected earlier in the course. Try to complete the function calls as much as you can using tidyverse functions and the programming concepts presented in this chapter.

8.5 Cheat sheets:

Cheat sheets already exist as great reference guide for some of the typical functions we’ll use in this workshop:

tidyeval

7 Writing Functions

9 Control Structures & Reitertions