Chapter 9 Element 4: Logical Expressions

9.1 Learning Objectives

By the end of this section, you should understand how to

Ask specific YES/NO questions of your data, and
Combine multiple questions in different ways.

Logical expressions allow us to ask specific questions about our data and are a central component of data analysis. They are composed of relational and logical operators.

9.2 Relational Operators

Relational operators always ask “yes/no” questions. In place of yes and no, R uses either TRUE/FALSE, T/F, or 1/0 to provide a positive or negative answer. The most common relational operators are listed in table 9.1:

Table 9.1: A summary of relational operators in R.
Operator	Description
`<`	Less than
`<=`	Less than or equal to
`>`	Greater than
`>=`	Greater than or equal to
`==`	Exactly equal to
`!=`	Not equal to, i.e. the opposite of `==`
`!x`	Not x (logical negation)

As an example, we can apply some logical tests to n and p.

n # What is n?

# [1] 34

p # What is p?

# [1] 6

n > p # Is n greater than p? TRUE

# [1] TRUE

n %% 2 == 0 # Is n even? TRUE

# [1] TRUE

n %% 2 != 0 # Is n odd? FALSE

# [1] FALSE

As we’ve already seen, an important feature is that we can replace an object with an updated version of itself. Be very careful here! If you accidentally change the value of an object, all the following calculations will be affected.

n # What is n?

# [1] 34

n == p # Is n equal to p?

# [1] FALSE

n <- p # Assign the value of p to n
n # What is the new value of n?

# [1] 6

n == p # Is n equal to p?

# [1] TRUE

n <- n+1
n # What is the new n?

# [1] 7

n == p # Is n equal to p?

# [1] FALSE

9.3 Logical Operators

A collection of logical expressions can be combined using the following logical operators.

Table 9.2 The most common logical operators. x \%in\% y is not a logical operator per se, but it is convenient to introduce it at this point. You’ll understand why when we see it in action.

Table 9.2: A summary of logical operators in R. `x` and `y` are logical vectores, e.g. the output from *relational operators*. `a` and `b` are vectors, typcially type `character`.
Operator	Description
`x \| y`	x OR y
`x & y`	x AND y
`b %in% a`	Find each element `a` in `b`.

9.4 Using logical expressions on different atomic vector types

Use subset() to select values corresponding to TRUE. This could be a logical vector, or a logical expression containing a relational operator, which will result in a logical vector.³⁹

subset(foo.df, foo.df$healthy == T)

#   healthy    tissue quantity
# 1    TRUE     Liver        1
# 4    TRUE    Muscle       19
# 5    TRUE Intestine       25

subset(foo.df, healthy == T)

#   healthy    tissue quantity
# 1    TRUE     Liver        1
# 4    TRUE    Muscle       19
# 5    TRUE Intestine       25

subset(foo.df, healthy)

#   healthy    tissue quantity
# 1    TRUE     Liver        1
# 4    TRUE    Muscle       19
# 5    TRUE Intestine       25

⁴⁰

# only those observations with
# low-quantity:
subset(foo.df, quantity < 10)

#   healthy tissue quantity
# 1    TRUE  Liver        1
# 2   FALSE  Brain        7

# a middle quantity (10-20):
subset(foo.df, quantity >= 10 & quantity <= 20)

#   healthy tissue quantity
# 3   FALSE Testes       13
# 4    TRUE Muscle       19

# all values
subset(foo.df, quantity >= 10 | quantity <= 20)

#   healthy    tissue quantity
# 1    TRUE     Liver        1
# 2   FALSE     Brain        7
# 3   FALSE    Testes       13
# 4    TRUE    Muscle       19
# 5    TRUE Intestine       25
# 6   FALSE     Heart       31

# tail values:
subset(foo.df, quantity < 10 | quantity > 20)

#   healthy    tissue quantity
# 1    TRUE     Liver        1
# 2   FALSE     Brain        7
# 5    TRUE Intestine       25
# 6   FALSE     Heart       31

# Impossible
subset(foo.df, quantity < 10 & quantity > 20)

# [1] healthy  tissue   quantity
# <0 rows> (or 0-length row.names)

# Heart values -  simple, only 1 value
subset(foo.df, tissue == "Heart")

#   healthy tissue quantity
# 6   FALSE  Heart       31

So what happens when we have 2 (or many) values we want to search?

# Heart and Liver - cheap way
subset(foo.df, tissue == "Heart" | tissue == "Liver")

#   healthy tissue quantity
# 1    TRUE  Liver        1
# 6   FALSE  Heart       31

You can probably appreciate that when you only have two queries, that is OK, but what happens when you have many? Maybe they are stored in a vector, perhaps the result from sub-setting another data frame. You may be tempted to do the following:

# Terrible way - NEVER do this!
subset(foo.df, tissue == c("Liver", "Heart"))

#   healthy tissue quantity
# 1    TRUE  Liver        1
# 6   FALSE  Heart       31

But you should NEVER do that. You see why when you change the order of the query:

# Terrible way - NEVER do this!
subset(foo.df, tissue == c("Heart", "Liver"))

# [1] healthy  tissue   quantity
# <0 rows> (or 0-length row.names)

So how do we search for many elements at the same time? Recall the \%in\% operator:

# Heart and Liver and Intestine - nice way
subset(foo.df, tissue %in% c("Heart", "Liver"))

#   healthy tissue quantity
# 1    TRUE  Liver        1
# 6   FALSE  Heart       31

9.5 Summary of Operators

So far we’ve encountered several operator types. Let’s recap:

Table 9.3: A summary of common operators in R.
Operator type	Example	Description	Example
Assign	`<-`	Create an object	Section 3.6
Colon	`:`	Create an series of integers (increments of 1)	Section 6.5
Arithmetic	`+`, `-`, ``, `/`, `^`, `*`, `%/%`, `%%`	Do math	Table 3.1
Relational	`>=`, `>`, `<=`, `<`, `==`, `!=`, `!x`	Ask TRUE/FALSE questions	Table 9.1
Logical	`&`, `&&`, `\|`, `\|\|`, `%in%`	Combine questions	Table 9.2
Double colon	`::`	Call a function within a specific package	Section 5.7
Pipe	`%>%`	Pipe two lines of code

9.6 Exercises for Logical Expressions

Complete the following exercises

Exercise 9.1 (Remove contaminants) Returning to the protein.df data frame, the Contaminant column contains a categorical variable with two possible values: "+" identifies contamination and blanks "" identify real proteins. First, answer the following questions:

How many contaminants are present in the data-set?
What fraction of the total do they represent?

Finally, remove all contaminants from the data-set, and save the new, clean data-set under the same name.

Try to solve the following exercises using the material covered so far. We’ll return to these again later on and see if we can simplify the process.

Exercise 9.2 (Find protein values) Given a list of Uniprot IDs:

GOGA7
PSA6
S10AB

Find the corresponding \(log_2\) ratios for each of the two conditions of interest (H/M, M/L).

Exercise 9.3 (Find significant hits) The columns ending with Sig contain p-values from hypothesis testing to determine how likely the corresponding ratio would be observed by chance. For the H/M ratio column, create a new data frame containing only proteins that have a p-value less than 0.05.

Exercise 9.4 (Find extreme values) For the H/M ratio column, create a new data frame containing only proteins that have a \(log_{2}\) ratio above 2.0 or below -2.0.

9.7 The Floating Point Trap

Real numbers, with decimal places, are type double in R (and in computing in general); they are also referred to as “floating point numbers”. These names refer to the way numeric values are stored in memory.⁴¹

For example, we know that 0.3/3 equates to 0.1:

0.3 / 3

# [1] 0.1

but it turns out that’s not true when we check using a relational operator:

0.1 == 0.3 / 3 # FALSE

# [1] FALSE

That’s really bad news if you’re basing your answers on this result! Why does this occur?

First off, to check for equivalency, we can use:

# To check for floating point numbers:
all.equal(0.1, 0.3 / 3) # TRUE

# [1] TRUE

Whenever we execute a command, without assigning the output to an object, we’ll see the output in two possible devices: the graphical device for plots, and in this case, the console (or in the case of an R markdown or LaTeX document, whatever output document you are generating). This means that under-the-hood, R is calling the print() command. If we adjust the number of significant digits it shows, we’ll see why the results are not the same:

print(0.3 / 3, digits = 16)

# [1] 0.09999999999999999

So that means if we round the result, it should equate to 0.1:

0.1 == round(0.3 / 3, 2) # TRUE

# [1] TRUE

Let’s take a look as some more involved examples. In the following sequence:

seq(0, 1, by=.1)

#  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

There is exactly one 0.3, but when I ask to see it:

seq(0, 1, by=.1) == .3

#  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [11] FALSE

I discover that there are none! In another example, all of the following equations equate to 0.3

# All 0.3:
all_0.3 <- c(.3,
             .4 - .1,
             .5 - .2,
             .6 - .3,
             .7 - .4)
all_0.3

# [1] 0.3 0.3 0.3 0.3 0.3

But when I look for real unique values here, expecting a single 0.3, I get three 0.3s.⁴²

unique(all_0.3)

# [1] 0.3 0.3 0.3

This is not an R-specific problem - it’s a computing problem. If you are going to ask questions like the ones above, you need to be aware that the answer may be affected by the storage method of the data.

Remember, anytime we use a relational operator, we’ll get a logical vector as output.↩
This example already gives us a preview of how we can extract specific pieces of information from a data-set using . This topic will be explored further with and .↩
For more information on this and other terrible R problems, see “R Inferno”, by Burns Stats↩
We see why this occurs when we print more digits, e.g. print(all\_0.3, digits = 17).↩