Chapter 7 Logical Expressions

7.1 Learning Objectives

By the end of this section, you should understand how to

  • Ask specific YES/NO questions of your data, and
  • Combine multiple questions in different ways.

Logical expressions allow us to ask specific questions about our data and are a central component of data analysis. They are composed of relational and logical operators.

7.2 Relational Operators

Relational operators always ask “yes/no” questions. In place of yes and no, R uses either TRUE/FALSE, T/F, or 1/0 to provide a positive or negative answer. The most common relational operators are listed in table 7.1:

Table 7.1: A summary of relational operators in R.
Operator Description
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to
== Exactly equal to
!= Not equal to, i.e. the opposite of ==
!x Not x (logical negation)

As an example, we can apply some logical tests to n and p.

n # What is n?
## [1] 34
p # What is p?
## [1] 6
n > p # Is n greater than p? TRUE
## [1] TRUE
n %% 2 == 0 # Is n even? TRUE
## [1] TRUE
n %% 2 != 0 # Is n odd? FALSE
## [1] FALSE

As we’ve already seen, an important feature is that we can replace an object with an updated version of itself. Be very careful here! If you accidentally change the value of an object, all the following calculations will be affected.

n # What is n?
## [1] 34
n == p # Is n equal to p?
## [1] FALSE
n <- p # Assign the value of p to n
n # What is the new value of n?
## [1] 6
n == p # Is n equal to p?
## [1] TRUE
n <- n+1
n # What is the new n?
## [1] 7
n == p # Is n equal to p?
## [1] FALSE

7.3 Logical Operators

A collection of logical expressions can be combined using the following logical operators.

Table 7.2 The most common logical operators. x \%in\% y is not a logical operator per se, but it is convenient to introduce it at this point. You’ll understand why when we see it in action.

Table 7.2: A summary of logical operators in R. x and y are logical vectores, e.g. the output from relational operators. a and b are vectors, typcially type character.
Operator Description
x | y x OR y
x & y x AND y
b %in% a Find each element a in b.

7.4 Using logical expressions on different atomic vector types

Enter logical expressions into filter() to select values corresponding to TRUE. This could be a logical vector, or a logical expression containing a relational operator, which will result in a logical vector[Remember, anytime we use a relational operator, we’ll get a logical vector as output.][You will also see that subset() Can be used in a similar fashion. Belongs to base package and has fallen out of use.].

foo_df %>% 
  filter(healthy == TRUE)
## # A tibble: 3 × 3
##   healthy tissue    quantity
##   <lgl>   <chr>        <dbl>
## 1 TRUE    Liver            1
## 2 TRUE    Muscle          19
## 3 TRUE    Intestine       25
foo_df %>% 
  filter(healthy == TRUE)
## # A tibble: 3 × 3
##   healthy tissue    quantity
##   <lgl>   <chr>        <dbl>
## 1 TRUE    Liver            1
## 2 TRUE    Muscle          19
## 3 TRUE    Intestine       25
foo_df %>% 
  filter(healthy)
## # A tibble: 3 × 3
##   healthy tissue    quantity
##   <lgl>   <chr>        <dbl>
## 1 TRUE    Liver            1
## 2 TRUE    Muscle          19
## 3 TRUE    Intestine       25

30

# only those observations with
# low-quantity:
foo_df %>% 
  filter(quantity < 10)
## # A tibble: 2 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 TRUE    Liver         1
## 2 FALSE   Brain         7
# a middle quantity (10-20):
foo_df %>% 
  filter(quantity >= 10 & quantity <= 20)
## # A tibble: 2 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 FALSE   Testes       13
## 2 TRUE    Muscle       19
# Alternatively:
foo_df %>% 
  filter(quantity >= 10 & quantity <= 20)
## # A tibble: 2 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 FALSE   Testes       13
## 2 TRUE    Muscle       19
# all values:
foo_df %>% 
  filter(quantity >= 10 | quantity <= 20)
## # A tibble: 6 × 3
##   healthy tissue    quantity
##   <lgl>   <chr>        <dbl>
## 1 TRUE    Liver            1
## 2 FALSE   Brain            7
## 3 FALSE   Testes          13
## 4 TRUE    Muscle          19
## 5 TRUE    Intestine       25
## 6 FALSE   Heart           31
# tail values:
foo_df %>% 
  filter(quantity < 10 | quantity > 20)
## # A tibble: 4 × 3
##   healthy tissue    quantity
##   <lgl>   <chr>        <dbl>
## 1 TRUE    Liver            1
## 2 FALSE   Brain            7
## 3 TRUE    Intestine       25
## 4 FALSE   Heart           31
# Impossible
foo_df %>% 
  filter(quantity < 10 & quantity > 20)
## # A tibble: 0 × 3
## # … with 3 variables: healthy <lgl>, tissue <chr>, quantity <dbl>
# Heart values -  simple, only 1 value
foo_df %>% 
  filter(tissue == "Heart")
## # A tibble: 1 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 FALSE   Heart        31

So what happens when we have 2 (or many) values we want to search?

# Heart and Liver - cheap way
foo_df %>% 
  filter(tissue == "Heart" | tissue == "Liver")
## # A tibble: 2 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 TRUE    Liver         1
## 2 FALSE   Heart        31

You can probably appreciate that when you only have two queries, that is OK, but what happens when you have many? Maybe they are stored in a vector, perhaps the result from sub-setting another data frame. You may be tempted to do the following:

# Terrible way - NEVER do this!
foo_df %>% 
  filter(tissue == c("Liver", "Heart"))
## # A tibble: 2 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 TRUE    Liver         1
## 2 FALSE   Heart        31

But you should NEVER do that. You see why when you change the order of the query:

# Terrible way - NEVER do this!
foo_df %>% 
  filter(tissue == c("Heart", "Liver"))
## # A tibble: 0 × 3
## # … with 3 variables: healthy <lgl>, tissue <chr>, quantity <dbl>

So how do we search for many elements at the same time? Recall the \%in\% operator:

# Heart and Liver and Intestine - nice way
foo_df %>% 
  filter(tissue %in% c("Heart", "Liver"))
## # A tibble: 2 × 3
##   healthy tissue quantity
##   <lgl>   <chr>     <dbl>
## 1 TRUE    Liver         1
## 2 FALSE   Heart        31

7.5 Summary of Operators

So far we’ve encountered several operator types. Let’s recap:

Table 7.3: A summary of common operators in R.
Operator type Example Description Example
Assign <- Create an object Section 2.3
Colon : Create an series of integers (increments of 1) Section 5.5
Arithmetic +, -, *, /, ^, **, %/%, %% Do math Table 2.1
Relational >=, >, <=, <, ==, !=, !x Ask TRUE/FALSE questions Table 7.1
Logical &, &&, |, ||, %in% Combine questions Table 7.2
Double colon :: Call a function within a specific package Section 10.7
Pipe %>% Pipe two lines of code Example ??, 12.3

7.6 Exercises for Logical Expressions

See the chapter on the diamonds data set and complete the exercises for logical expressions.

7.7 The Floating Point Trap

Real numbers, with decimal places, are type double in R (and in computing in general); they are also referred to as “floating point numbers.” These names refer to the way numeric values are stored in memory.31

For example, we know that 0.3/3 equates to 0.1:

0.3 / 3
## [1] 0.1

but it turns out that’s not true when we check using a relational operator:

0.1 == 0.3 / 3 # FALSE
## [1] FALSE

That’s really bad news if you’re basing your answers on this result! Why does this occur?

First off, to check for equivalency, we can use:

# To check for floating point numbers:
all.equal(0.1, 0.3 / 3) # TRUE
## [1] TRUE

Whenever we execute a command, without assigning the output to an object, we’ll see the output in two possible devices: the graphical device for plots, and in this case, the console (or in the case of an R markdown or LaTeX document, whatever output document you are generating). This means that under-the-hood, R is calling the print() command. If we adjust the number of significant digits it shows, we’ll see why the results are not the same:

print(0.3 / 3, digits = 16)
## [1] 0.09999999999999999

So that means if we round the result, it should equate to 0.1:

0.1 == round(0.3 / 3, 2) # TRUE
## [1] TRUE

Let’s take a look as some more involved examples. In the following sequence:

seq(0, 1, by=.1)
##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

There is exactly one 0.3, but when I ask to see it:

seq(0, 1, by=.1) == .3
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

I discover that there are none! In another example, all of the following equations equate to 0.3

# All 0.3:
all_0.3 <- c(.3,
             .4 - .1,
             .5 - .2,
             .6 - .3,
             .7 - .4)
all_0.3
## [1] 0.3 0.3 0.3 0.3 0.3

But when I look for real unique values here, expecting a single 0.3, I get three 0.3s.32

unique(all_0.3)
## [1] 0.3 0.3 0.3

This is not an R-specific problem - it’s a computing problem. If you are going to ask questions like the ones above, you need to be aware that the answer may be affected by the storage method of the data.


  1. This example already gives us a preview of how we can extract specific pieces of information from a data-set using . This topic will be explored further with and .↩︎

  2. For more information on this and other terrible R problems, see “R Inferno,” by Burns Stats↩︎

  3. We see why this occurs when we print more digits, e.g. print(all\_0.3, digits = 17).↩︎