8 Programming with the Tidyverse
8.1 Learning Objectives
How to write custom functions that are flexible and take advantage of non-standard evaluation prevalent in the tidyverse.
8.2 Introduction
In the last chapter we covered the fundamentals of writing custom functions. In the exercises we also a few variations on writing functions. Let’s review exercise 7.8 before moving on. In the task, you were given yy
, an object of class lm
, type list
:
# yy is an object of class lm
yy <- lm(mpg ~ wt, data = mtcars)
class(yy)
#> [1] "lm"
typeof(yy)
#> [1] "list"
You were tasked with writing a function, assigning yy
to the generic argument x
, that returns sum(x$residuals^2)
5. The solution is should be familiar to you by now.
# Define the function =====
# _1 uses dollar notation
calc_SSR_1 <- function(x) {
sum(x$residuals^2)
}
# Test the function
calc_SSR_1(yy)
#> [1] 278.3219
Let’s explore this function in more depth with another exercise.
yy
, the lm
class object we defined above. This object is a type list
comprised of 12 named elements. Define a new function: lm_extract()
. This function will take two arguments. First, x
will take an lm
class object. Second, item
will take a 1-element long character vector which specifies which named element to return.
The valid names are:
# The names of each element
names(yy)
#> [1] "coefficients" "residuals" "effects"
#> [4] "rank" "fitted.values" "assign"
#> [7] "qr" "df.residual" "xlevels"
#> [10] "call" "terms" "model"
As an example, the output of your function should be:
lm_extract(x = yy, item = "fitted.values")
We’ll review the solution in our first group session. Here’s a hint: Recall that we can access a named element using either $
notation or indexing with []
and for lists we can also use [[]]
.
Before we move on, let’s try one more trick to showcase how R looks for an object.
lm_extract(x = yy, item = "fitted.values")
from above. Is it possible to only specify item
and not x
? Indeed, it is possible and there are many different solutions. Try to solve this exercise on your own before looking at the solutions given below.
The function call should be:
lm_extract("fitted.values")
Make a solid attempt to develop your own solution before looking at the ones below.
One solution could be to define the object yy
inside the function definition for lm_extract()
.
lm_extract_internal <- function(item) {
lm(mpg ~ wt, data = mtcars)[[item]]
}
lm_extract_internal("fitted.values")
#> Mazda RX4 Mazda RX4 Wag Datsun 710
#> 23.282611 21.919770 24.885952
#> Hornet 4 Drive Hornet Sportabout Valiant
#> 20.102650 18.900144 18.793255
#> Duster 360 Merc 240D Merc 230
#> 18.205363 20.236262 20.450041
#> Merc 280 Merc 280C Merc 450SE
#> 18.900144 18.900144 15.533127
#> Merc 450SL Merc 450SLC Cadillac Fleetwood
#> 17.350247 17.083024 9.226650
#> Lincoln Continental Chrysler Imperial Fiat 128
#> 8.296712 8.718926 25.527289
#> Honda Civic Toyota Corolla Toyota Corona
#> 28.653805 27.478021 24.111004
#> Dodge Challenger AMC Javelin Camaro Z28
#> 18.472586 18.926866 16.762355
#> Pontiac Firebird Fiat X1-9 Porsche 914-2
#> 16.735633 26.943574 25.847957
#> Lotus Europa Ford Pantera L Ferrari Dino
#> 29.198941 20.343151 22.480940
#> Maserati Bora Volvo 142E
#> 18.205363 22.427495
This could be a solution, but it should be used with caution. That’s because we’ve hard-coded yy
, the lm
class object. We didn’t even provide a way to assign a different data set.
Notice that I don’t need to create any new objects inside the function definition. After equating lm()
we worked directly with the students the object, e.g. [[item]]
without an intermediate step. But where will the dataframe mtcars
come from? In this case we’re kind of cheating since its a builtin
object and we’ll always have access to it. But what if it was an object we made and only exists in our Global Environment, i.e. the workspace where user-defined objects typically exist. Let’s see that in action by making a copy of mtcars
, called mtcars_global
, which only exists in the Global Enviornment.
mtcars_global <- mtcars
lm_extract_global <- function(item) {
lm(mpg ~ wt, data = mtcars_global)[[item]]
}
lm_extract_global("fitted.values")
#> Mazda RX4 Mazda RX4 Wag Datsun 710
#> 23.282611 21.919770 24.885952
#> Hornet 4 Drive Hornet Sportabout Valiant
#> 20.102650 18.900144 18.793255
#> Duster 360 Merc 240D Merc 230
#> 18.205363 20.236262 20.450041
#> Merc 280 Merc 280C Merc 450SE
#> 18.900144 18.900144 15.533127
#> Merc 450SL Merc 450SLC Cadillac Fleetwood
#> 17.350247 17.083024 9.226650
#> Lincoln Continental Chrysler Imperial Fiat 128
#> 8.296712 8.718926 25.527289
#> Honda Civic Toyota Corolla Toyota Corona
#> 28.653805 27.478021 24.111004
#> Dodge Challenger AMC Javelin Camaro Z28
#> 18.472586 18.926866 16.762355
#> Pontiac Firebird Fiat X1-9 Porsche 914-2
#> 16.735633 26.943574 25.847957
#> Lotus Europa Ford Pantera L Ferrari Dino
#> 29.198941 20.343151 22.480940
#> Maserati Bora Volvo 142E
#> 18.205363 22.427495
It still works! Can you explain why?
The reason has to do with scoping. To understand this, we should begin by understanding the many environments in R. When you start a new R session and subsequently attach packages by using the library()
function, you create a chain of nested environments. The last environment is the Global Environment, the workspace where you custom user-defined objects typically exist.
We also call (i.e. execute, or run) functions inside the Global Environment, either user-defined, like lm_extract_global()
, or any variety of those provided by the attached packages or base R. When a function is called from the Global Environment, it creates an additional, further nested environment at the end of, and isolated from, the rest of the environment chain. The function operates first with the objects it has in its own environment. If an object is needed but not present, as with mtcars_global
, then the function searches for in its parent environment, which will be the Global Environment. If it’s not there, it will keep going all that way back up to the base R. This is exactly what happens when we called mtcars
from the Global Environment. That object is from base R, as are many function that we call, but we find it eventually, but going back through the chain of nested environments.
8.3 dplyr and NSE, Non-Standard Evaluation
There are some problems when using tidyverse within custom functions, in particular because dplyr
uses non-standard evaluation (NSE) syntax inside functions. This allows it to delay execution, which is very handy, but can cause some issues. Let’s consider some examples.
This is completely fine use in interactive mode:
library(dplyr)
mtcars %>%
as_tibble() %>%
filter(cyl == 8)
#> # A tibble: 14 × 17
#> mpg cyl disp hp drat wt qsec vs am
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 18.7 8 360 175 3.15 3.44 17.0 0 0
#> 2 14.3 8 360 245 3.21 3.57 15.8 0 0
#> 3 16.4 8 276. 180 3.07 4.07 17.4 0 0
#> 4 17.3 8 276. 180 3.07 3.73 17.6 0 0
#> 5 15.2 8 276. 180 3.07 3.78 18 0 0
#> 6 10.4 8 472 205 2.93 5.25 18.0 0 0
#> 7 10.4 8 460 215 3 5.42 17.8 0 0
#> 8 14.7 8 440 230 3.23 5.34 17.4 0 0
#> 9 15.5 8 318 150 2.76 3.52 16.9 0 0
#> 10 15.2 8 304 150 3.15 3.44 17.3 0 0
#> 11 13.3 8 350 245 3.73 3.84 15.4 0 0
#> 12 19.2 8 400 175 3.08 3.84 17.0 0 0
#> 13 15.8 8 351 264 4.22 3.17 14.5 0 1
#> 14 15 8 301 335 3.54 3.57 14.6 0 1
#> # … with 8 more variables: gear <dbl>, carb <dbl>,
#> # cyl_f <fct>, vs_f <fct>, am_f <fct>, gear_f <fct>,
#> # carb_f <fct>, car <chr>
But inside a function, it doesn’t work. Can you see why not?
getCyl <- function(cyl) {
mtcars %>%
as_tibble() %>%
filter(cyl == cyl)
}
getCyl(8)
#> # A tibble: 32 × 17
#> mpg cyl disp hp drat wt qsec vs am
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1
#> 2 21 6 160 110 3.9 2.88 17.0 0 1
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0
#> # … with 22 more rows, and 8 more variables: gear <dbl>,
#> # carb <dbl>, cyl_f <fct>, vs_f <fct>, am_f <fct>,
#> # gear_f <fct>, carb_f <fct>, car <chr>
How about this?
getMetric <- function(metric, value) {
mtcars %>%
as_tibble() %>%
filter(metric == value)
}
getMetric("cyl", 8)
#> # A tibble: 0 × 17
#> # … with 17 variables: mpg <dbl>, cyl <fct>, disp <dbl>,
#> # hp <dbl>, drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>,
#> # am <fct>, gear <dbl>, carb <dbl>, cyl_f <fct>,
#> # vs_f <fct>, am_f <fct>, gear_f <fct>, carb_f <fct>,
#> # car <chr>
In base R, we would use this:
getMetric <- function(metric, value) {
mtcars[mtcars[[metric]] == value,]
}
getMetric("cyl", 8)
#> mpg cyl disp hp drat wt qsec vs
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0
#> am gear carb cyl_f vs_f am_f gear_f
#> Hornet Sportabout 0 3 2 8 0 0 3
#> Duster 360 0 3 4 8 0 0 3
#> Merc 450SE 0 3 3 8 0 0 3
#> Merc 450SL 0 3 3 8 0 0 3
#> Merc 450SLC 0 3 3 8 0 0 3
#> Cadillac Fleetwood 0 3 4 8 0 0 3
#> Lincoln Continental 0 3 4 8 0 0 3
#> Chrysler Imperial 0 3 4 8 0 0 3
#> Dodge Challenger 0 3 2 8 0 0 3
#> AMC Javelin 0 3 2 8 0 0 3
#> Camaro Z28 0 3 4 8 0 0 3
#> Pontiac Firebird 0 3 2 8 0 0 3
#> Ford Pantera L 1 5 4 8 0 1 5
#> Maserati Bora 1 5 8 8 0 1 5
#> carb_f car
#> Hornet Sportabout 2 Hornet Sportabout
#> Duster 360 4 Duster 360
#> Merc 450SE 3 Merc 450SE
#> Merc 450SL 3 Merc 450SL
#> Merc 450SLC 3 Merc 450SLC
#> Cadillac Fleetwood 4 Cadillac Fleetwood
#> Lincoln Continental 4 Lincoln Continental
#> Chrysler Imperial 4 Chrysler Imperial
#> Dodge Challenger 2 Dodge Challenger
#> AMC Javelin 2 AMC Javelin
#> Camaro Z28 4 Camaro Z28
#> Pontiac Firebird 2 Pontiac Firebird
#> Ford Pantera L 4 Ford Pantera L
#> Maserati Bora 8 Maserati Bora
However, if you set the data, this works:
getMetric <- function(df, value) {
df %>%
as_tibble() %>%
filter(cyl == value)
}
getMetric(mtcars, 8)
#> # A tibble: 14 × 17
#> mpg cyl disp hp drat wt qsec vs am
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 18.7 8 360 175 3.15 3.44 17.0 0 0
#> 2 14.3 8 360 245 3.21 3.57 15.8 0 0
#> 3 16.4 8 276. 180 3.07 4.07 17.4 0 0
#> 4 17.3 8 276. 180 3.07 3.73 17.6 0 0
#> 5 15.2 8 276. 180 3.07 3.78 18 0 0
#> 6 10.4 8 472 205 2.93 5.25 18.0 0 0
#> 7 10.4 8 460 215 3 5.42 17.8 0 0
#> 8 14.7 8 440 230 3.23 5.34 17.4 0 0
#> 9 15.5 8 318 150 2.76 3.52 16.9 0 0
#> 10 15.2 8 304 150 3.15 3.44 17.3 0 0
#> 11 13.3 8 350 245 3.73 3.84 15.4 0 0
#> 12 19.2 8 400 175 3.08 3.84 17.0 0 0
#> 13 15.8 8 351 264 4.22 3.17 14.5 0 1
#> 14 15 8 301 335 3.54 3.57 14.6 0 1
#> # … with 8 more variables: gear <dbl>, carb <dbl>,
#> # cyl_f <fct>, vs_f <fct>, am_f <fct>, gear_f <fct>,
#> # carb_f <fct>, car <chr>
There is another problem. If a column name exists as a global variable, and not in the data frame itself, then it will silently fail.
# An object in the global environment
cyl <- 1
# mtcars, excluding the cyl variable
mtcars_noCyl <- mtcars %>%
select(-cyl)
getMetric <- function(df, value) {
df %>%
as_tibble() %>%
filter(cyl == value)
}
# This silent failure returns a value, not an error!
getMetric(mtcars_noCyl, 8)
#> # A tibble: 0 × 16
#> # … with 16 variables: mpg <dbl>, disp <dbl>, hp <dbl>,
#> # drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <fct>,
#> # gear <dbl>, carb <dbl>, cyl_f <fct>, vs_f <fct>,
#> # am_f <fct>, gear_f <fct>, carb_f <fct>, car <chr>
To avoid this, we need to use the .data
pronoun to explicitly refer to the data set at hand.
getMetric <- function(df, value) {
df %>%
as_tibble() %>%
filter(.data$cyl == value)
}
getMetric(mtcars_noCyl, 8)
This results in the following error as we expect it should:
Error: Problem with `filter()` input `..1`. ℹ Input `..1` is `.data$cyl == value`. x Column `cyl` not found in `.data` Run `rlang::last_error()` to see where the error occurred.
If we continue with our analogy of dplyr as the grammar of data analysis, we can consider .data
as a pronoun. Just like pronouns in regular language, .data
refers to an object inside a data frame, but we make is explicit rather than implicit, which is a pretty good idea, which we’ll see below. Using .data
is the best practice when using dplyr in functions and is required when writing packages. Know also that .data
is an internal object in the dplyr grammar, it has nothing to do with the actual name of the data frame, even though we assign our data frame to the argument data
in our functions.
To appreciate the importance of the .data
pronoun, consider that in this example, we refer to mtcars$cyl implicitly.
# Implicit reference to data frame (mtcars):
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
We can do that in a function and it stills works fine:
# A problem if this object exists in the
# global environment but not in the data frame
mpg <- 1:4
# Implicit reference to data frame (mtcars):
calcSummary <- function(data) {
data %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
}
calcSummary(mtcars)
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
But we would prefer to make it explicit:
# Explicit reference to data:
calcSummary <- function(data) {
data %>%
group_by(cyl) %>%
summarise(avg = mean(.data$mpg))
}
calcSummary(mtcars)
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
We can also refer to column names stored as strings, e.g.:
# Explicit reference using a string is not possible with $ notation
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(.data$cyl))
#> Warning in mean.default(.data$cyl): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(.data$cyl): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(.data$cyl): argument is not numeric
#> or logical: returning NA
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 NA
#> 2 6 NA
#> 3 8 NA
# Explicit reference using a string works with [] and [[]]
myVar <- "mpg"
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(.data[[myVar]]))
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
Here, myVar
is unquoted, meaning that a column named myVar in the data frame cannot mask it. Thus, we can use the .data
pronoun safely in functions and packages. .data
also respects grouped data frame attributes.
8.4 Quoting, unquoting and quosures
myScatter <- function(.data, x, y, z = NULL) {
ggplot(.data) +
geom_point(aes(x, y)) +
theme_classic()
}
# Function Call 1:
myScatter(mtcars, wt, mpg)
# Function Call 2:
myScatter(mtcars, "wt", "mpg")
The answer is neither one! Can you explain why?
We wouldn’t usually write the name of a variable with quotation marks when executing the plotting functions in interactive mode. Although we may want to do that in our custom function, it’s not typical and we may rather have the actual variable names instead. Let’s take a look at both in turn.
For function call 1, we’ll show a detailed solution to give you context, and then a shorter, easier version which will be preferable. Let’s begin with the detailed version.
The reason that putting values into an expression in dplyr doesn’t work is that those values are “quotes.” To get around this we can use quosures A quosure is a special type of formula obtained using the quo()
function. It contains a quoted expression and information about the environment where it came from, i.e. the scope. R is lexically scoped, meaning that functions carry with them a reference to the environment within which they were defined.
Here, we make a quosure with the enquo()
function. the en
part tells us that this is for expressions user will make, e.g. as input to a function.
# To properly use function call 1, above:
myScatter <- function(.data, x, y, z = NULL) {
x <- enquo(x)
y <- enquo(y)
ggplot(.data) +
geom_point(aes(!!x, !!y)) +
theme_classic()
}
# Function Call 1:
myScatter(mtcars, wt, mpg)
This process has been simplified by introducing the double curley brackets {{}}
, which embraces the variable
# Use embracing to simplify the function and properly use function call 1, above:
myScatter <- function(.data, x, y, z = NULL) {
ggplot(.data) +
geom_point(aes({{x}}, {{y}})) +
theme_classic()
}
# Function Call 1:
myScatter(mtcars, wt, mpg)
Alternatively, we could have used standard evaluation by providing a string. One solution would be to use aes_string()
instead of aes()
, which expects input as a string:
# To properly use function call 2:
myScatter <- function(.data, x, y, z = NULL) {
ggplot(.data) +
geom_point(aes_string(x, y)) +
theme_classic()
}
# Function Call 2:
myScatter(mtcars, "wt", "mpg")
However, this is a short-cut to the more explicit and flexible use of calling the name of the variable using [[]]
(or []
) notation as we have seen earlier:
# To properly use function call 2:
myScatter <- function(.data, x, y, z = NULL) {
ggplot(.data) +
geom_point(aes(.data[[x]], .data[[y]])) +
theme_classic()
}
# Function Call 2:
myScatter(mtcars, "wt", "mpg")
Using a character vector to refer to a variable is also an issue in interactive mode. Here, the object var
refers to the data-variable, but this doesn’t work:
var <- "mpg"
# We force `var`, which substitutes it with `mpg`
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(!!var))
#> Warning in mean.default("mpg"): argument is not numeric or
#> logical: returning NA
#> Warning in mean.default("mpg"): argument is not numeric or
#> logical: returning NA
#> Warning in mean.default("mpg"): argument is not numeric or
#> logical: returning NA
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 NA
#> 2 6 NA
#> 3 8 NA
So we need to quote it using sym()
and unquote it using !!
myVar <- sym("mpg")
# We force `var`, which substitutes it with `mpg`
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(!!myVar))
#> # A tibble: 3 × 2
#> cyl avg
#> <fct> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
we need both !!
and sym()
. !!
unquotes the symbol or call that follows (say “unquote” or “bang bang”). We can see that here:
a <- 10
b <- 90
expr(log10(!!a + b))
#> log10(10 + b)
expr(log(!!(a + b)))
#> log(100)
log10(a + b)
#> [1] 2
We are using numbers, as in values in a variable, we don’t need to use sym()
:
myQuery1 <- 8
myQuery2 <- 1
mtcars %>%
filter(cyl == !!myQuery1, am == !!myQuery2)
#> mpg cyl disp hp drat wt qsec vs am gear
#> Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5
#> Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5
#> carb cyl_f vs_f am_f gear_f carb_f
#> Ford Pantera L 4 8 0 1 5 4
#> Maserati Bora 8 8 0 1 5 8
#> car
#> Ford Pantera L Ford Pantera L
#> Maserati Bora Maserati Bora
To make things more confusing, you may also encounter !!!
which unquotes a vector or list and splices the results as arguments into the surrounding call. Here 8
and base = 2
are the arguments for log()
. Pronounce this as “unquote splice” or “bang-bang-bang.” The !!!
notation is also used for triple negation, to emphasize that a logical vector has been negated, but you can see how this may cause some confusion now!
input1 <- 8
input2 <- 2
x <- list(input1, base = input2)
myExpression <- expr(log(!!!x))
myExpression
#> log(8, base = 2)
eval(myExpression)
#> [1] 3
We can see this in action here.
mtcars %>%
select(cyl, mpg)
#> cyl mpg
#> Mazda RX4 6 21.0
#> Mazda RX4 Wag 6 21.0
#> Datsun 710 4 22.8
#> Hornet 4 Drive 6 21.4
#> Hornet Sportabout 8 18.7
#> Valiant 6 18.1
#> Duster 360 8 14.3
#> Merc 240D 4 24.4
#> Merc 230 4 22.8
#> Merc 280 6 19.2
#> Merc 280C 6 17.8
#> Merc 450SE 8 16.4
#> Merc 450SL 8 17.3
#> Merc 450SLC 8 15.2
#> Cadillac Fleetwood 8 10.4
#> Lincoln Continental 8 10.4
#> Chrysler Imperial 8 14.7
#> Fiat 128 4 32.4
#> Honda Civic 4 30.4
#> Toyota Corolla 4 33.9
#> Toyota Corona 4 21.5
#> Dodge Challenger 8 15.5
#> AMC Javelin 8 15.2
#> Camaro Z28 8 13.3
#> Pontiac Firebird 8 19.2
#> Fiat X1-9 4 27.3
#> Porsche 914-2 4 26.0
#> Lotus Europa 4 30.4
#> Ford Pantera L 8 15.8
#> Ferrari Dino 6 19.7
#> Maserati Bora 8 15.0
#> Volvo 142E 4 21.4
vars <- c("cyl", "mpg")
mtcars %>%
select(vars)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(vars)` instead of `vars` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> cyl mpg
#> Mazda RX4 6 21.0
#> Mazda RX4 Wag 6 21.0
#> Datsun 710 4 22.8
#> Hornet 4 Drive 6 21.4
#> Hornet Sportabout 8 18.7
#> Valiant 6 18.1
#> Duster 360 8 14.3
#> Merc 240D 4 24.4
#> Merc 230 4 22.8
#> Merc 280 6 19.2
#> Merc 280C 6 17.8
#> Merc 450SE 8 16.4
#> Merc 450SL 8 17.3
#> Merc 450SLC 8 15.2
#> Cadillac Fleetwood 8 10.4
#> Lincoln Continental 8 10.4
#> Chrysler Imperial 8 14.7
#> Fiat 128 4 32.4
#> Honda Civic 4 30.4
#> Toyota Corolla 4 33.9
#> Toyota Corona 4 21.5
#> Dodge Challenger 8 15.5
#> AMC Javelin 8 15.2
#> Camaro Z28 8 13.3
#> Pontiac Firebird 8 19.2
#> Fiat X1-9 4 27.3
#> Porsche 914-2 4 26.0
#> Lotus Europa 4 30.4
#> Ford Pantera L 8 15.8
#> Ferrari Dino 6 19.7
#> Maserati Bora 8 15.0
#> Volvo 142E 4 21.4
vars <- syms(c("cyl", "mpg"))
mtcars %>% select(!!!vars)
#> cyl mpg
#> Mazda RX4 6 21.0
#> Mazda RX4 Wag 6 21.0
#> Datsun 710 4 22.8
#> Hornet 4 Drive 6 21.4
#> Hornet Sportabout 8 18.7
#> Valiant 6 18.1
#> Duster 360 8 14.3
#> Merc 240D 4 24.4
#> Merc 230 4 22.8
#> Merc 280 6 19.2
#> Merc 280C 6 17.8
#> Merc 450SE 8 16.4
#> Merc 450SL 8 17.3
#> Merc 450SLC 8 15.2
#> Cadillac Fleetwood 8 10.4
#> Lincoln Continental 8 10.4
#> Chrysler Imperial 8 14.7
#> Fiat 128 4 32.4
#> Honda Civic 4 30.4
#> Toyota Corolla 4 33.9
#> Toyota Corona 4 21.5
#> Dodge Challenger 8 15.5
#> AMC Javelin 8 15.2
#> Camaro Z28 8 13.3
#> Pontiac Firebird 8 19.2
#> Fiat X1-9 4 27.3
#> Porsche 914-2 4 26.0
#> Lotus Europa 4 30.4
#> Ford Pantera L 8 15.8
#> Ferrari Dino 6 19.7
#> Maserati Bora 8 15.0
#> Volvo 142E 4 21.4
getMean <- function(data, group_var, summary_var) {
}
The following commands should work:
getMean(mtcars, cyl, mpg)
#> # A tibble: 3 × 2
#> cyl mean
#> <fct> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
getMean(mtcars, am, hp)
#> # A tibble: 2 × 2
#> am mean
#> <fct> <dbl>
#> 1 0 160.
#> 2 1 127.
getMean(iris, Species, Petal.Length)
#> # A tibble: 3 × 2
#> Species mean
#> <fct> <dbl>
#> 1 setosa 1.46
#> 2 versicolor 4.26
#> 3 virginica 5.55