# Chapter 9 Element 3: Logical Expressions

## 9.1 Learning Objectives

By the end of this section, you should understand how to

- Ask specific YES/NO questions of your data, and
- Combine multiple questions in different ways.

Logical expressions allow us to ask specific questions about our data and are a central component of data analysis. They are composed of *relational* and *logical operators*.

## 9.2 Relational Operators

Relational operators always ask “yes/no” questions. In place of yes and no, R uses either `TRUE/FALSE`

, `T/F`

, or `1/0`

to provide a positive or negative answer. The most common *relational operators* are listed in table 9.1:

Operator | Description |
---|---|

`<` |
Less than |

`<=` |
Less than or equal to |

`>` |
Greater than |

`>=` |
Greater than or equal to |

`==` |
Exactly equal to |

`!=` |
Not equal to, i.e. the opposite of `==` |

`!x` |
Not x (logical negation) |

As an example, we can apply some logical tests to `n`

and `p`

.

`# [1] 34`

`# [1] 6`

`# [1] TRUE`

`# [1] TRUE`

`# [1] FALSE`

As we’ve already seen, an important feature is that we can replace an object with an updated version of itself. Be very careful here! If you accidentally change the value of an object, all the following calculations will be affected.

`# [1] 34`

`# [1] FALSE`

`# [1] 6`

`# [1] TRUE`

`# [1] 7`

`# [1] FALSE`

## 9.3 Logical Operators

A collection of logical expressions can be combined using the following *logical operators*.

Table 9.2 The most common logical operators. `x \%in\% y`

is not a logical operator *per se*, but it is convenient to introduce it at this point. You’ll understand why when we see it in action.

Operator | Description |
---|---|

`x | y` |
x OR y |

`x & y` |
x AND y |

`b %in% a` |
Find each element `a` in `b` . |

## 9.4 Using logical expressions on different atomic vector types

Enter logical expressions into `filter()`

to select values corresponding to `TRUE`

. This could be a logical vector, or a logical expression containing a relational operator, which will result in a logical vector^{[Remember, anytime we use a relational operator, we’ll get a logical vector as output.]}[You will also see that `subset()`

Can be used in a similar fashion. Belongs to base package and has fallen out of use.].

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 TRUE Muscle 19
# 3 TRUE Intestine 25
```

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 TRUE Muscle 19
# 3 TRUE Intestine 25
```

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 TRUE Muscle 19
# 3 TRUE Intestine 25
```

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 FALSE Brain 7
```

```
# healthy tissue quantity
# 1 FALSE Testes 13
# 2 TRUE Muscle 19
```

```
# healthy tissue quantity
# 1 FALSE Testes 13
# 2 TRUE Muscle 19
```

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 FALSE Brain 7
# 3 FALSE Testes 13
# 4 TRUE Muscle 19
# 5 TRUE Intestine 25
# 6 FALSE Heart 31
```

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 FALSE Brain 7
# 3 TRUE Intestine 25
# 4 FALSE Heart 31
```

```
# [1] healthy tissue quantity
# <0 rows> (or 0-length row.names)
```

```
# healthy tissue quantity
# 1 FALSE Heart 31
```

So what happens when we have 2 (or many) values we want to search?

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 FALSE Heart 31
```

You can probably appreciate that when you only have two queries, that is OK, but what happens when you have many? Maybe they are stored in a vector, perhaps the result from sub-setting another data frame. You may be tempted to do the following:

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 FALSE Heart 31
```

But you should NEVER do that. You see why when you change the order of the query:

```
# [1] healthy tissue quantity
# <0 rows> (or 0-length row.names)
```

So how do we search for many elements at the same time? Recall the `\%in\%`

operator:

```
# healthy tissue quantity
# 1 TRUE Liver 1
# 2 FALSE Heart 31
```

## 9.5 Summary of Operators

So far we’ve encountered several operator types. Let’s recap:

Operator type | Example | Description | Example |
---|---|---|---|

Assign | `<-` |
Create an object | Section 3.6 |

Colon | `:` |
Create an series of integers (increments of 1) | Section 6.5 |

Arithmetic | `+` , `-` , `*` , `/` , `^` , `**` , `%/%` , `%%` |
Do math | Table 3.1 |

Relational | `>=` , `>` , `<=` , `<` , `==` , `!=` , `!x` |
Ask TRUE/FALSE questions | Table 9.1 |

Logical | `&` , `&&` , `|` , `||` , `%in%` |
Combine questions | Table 9.2 |

Double colon | `::` |
Call a function within a specific package | Section 5.7 |

Pipe | `%>%` |
Pipe two lines of code | Example ??, ?? |

## 9.6 Exercises for Logical Expressions

Complete the following exercises

**Exercise 9.1 (Identify contaminants) **Returning to the `protein_df`

data frame, the `Contaminant`

column contains a categorical variable with two possible values: `"+"`

identifies contamination and blanks `""`

identify real proteins. First, answer the following questions:

- How many contaminants are present in the data-set?
- What fraction of the total do they represent?

**Exercise 9.2 (Remove contaminants)**Finally, remove all contaminants from the data-set, and save the new, clean data-set under the same name,

`protein_df`

.
Your dataset should now have the following dimensions:

`# [1] 1207 17`

Try to solve the following exercises using the material covered so far. We’ll return to these again later on and see if we can simplify the process.

**Exercise 9.3 (Find protein values) **
Given a list of Uniprot IDs:

- GOGA7
- PSA6
- S10AB

Find the corresponding \(log_2\) ratios for each of the two conditions of interest (H/M, M/L).

**Exercise 9.4 (Find significant hits)**The columns ending with

`Sig`

contain p-values from hypothesis testing to determine how likely the corresponding ratio would be observed by chance. For the H/M ratio column, create a new data frame containing only proteins that have a p-value less than 0.05.

**Exercise 9.5 (Find extreme values) **For the H/M ratio column, create a new data frame containing only proteins that have a \(log_{2}\) ratio above 2.0 or below -2.0.

## 9.7 The Floating Point Trap

Real numbers, with decimal places, are type `double`

in R (and in computing in general); they are also referred to as “floating point numbers”. These names refer to the way numeric values are stored in memory.^{41}

For example, we know that `0.3/3`

equates to `0.1`

:

`# [1] 0.1`

but it turns out that’s not true when we check using a relational operator:

`# [1] FALSE`

That’s really bad news if you’re basing your answers on this result! Why does this occur?

First off, to check for equivalency, we can use:

`# [1] TRUE`

Whenever we execute a command, *without* assigning the output to an object, we’ll see the output in two possible *devices*: the graphical device for plots, and in this case, the console (or in the case of an R markdown or LaTeX document, whatever output document you are generating). This means that under-the-hood, R is calling the `print()`

command. If we adjust the number of significant digits it shows, we’ll see why the results are not the same:

`# [1] 0.09999999999999999`

So that means if we round the result, it should equate to 0.1:

`# [1] TRUE`

Let’s take a look as some more involved examples. In the following sequence:

`# [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0`

There is exactly one 0.3, but when I ask to see it:

```
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [11] FALSE
```

I discover that there are none! In another example, all of the following equations equate to `0.3`

`# [1] 0.3 0.3 0.3 0.3 0.3`

But when I look for real unique values here, expecting a single `0.3`

, I get three `0.3`

s.^{42}

`# [1] 0.3 0.3 0.3`

This is *not* an R-specific problem - it’s a computing problem. If you are going to ask questions like the ones above, you need to be aware that the answer may be affected by the storage method of the data.

This example already gives us a preview of how we can extract specific pieces of information from a data-set using . This topic will be explored further with and .↩︎

For more information on this and other terrible R problems, see “R Inferno”, by Burns Stats↩︎

We see why this occurs when we print more digits, e.g.

`print(all\_0.3, digits = 17)`

.↩︎