10 R Markdown

10.1 Learning Objectives

All the steps of your data analysis need to be documented in a human readable format for transparent and reproducible research. In this section you’ll learn about:

  • R markdown
  • Generating word, HTML, and pdf documents

Reproducible research refers to academic research papers containing the full computational environment used to produce the results. This includes not only commentary, but also the data and code that can be used to reproduce the results and create new work based on the research.

The11 goal of reproducible research is to document your data analysis, making it easier for others to see exactly what you’ve done and to reproduce it.

10.2 Scripting

In the previous section we conducted our analysis by generating a script. That’s simply a text document with a series of commands. The script acts like a function: it’s a piece of standard code which can be reused. Writing scripts serves two purposes. First, it allows your data analysis to be transparent and reproducible, and second, it allows you to save time by not repeating the same steps over and over again.

10.3 R Markdown and the knitr Package

The gold standard for reproducible12 research is interweaving the raw data, the analysis and your interpretation of the data into a single document.

Three tools are necessary for reproducible research of this type. First, R provides the foundation for analyzing and visualizing your data. In this way, all steps of the analysis are documented. Second, markdown, a literate programming language, is used for type-setting documents. For our purposes, we will use R Markdown, which serves as a gentle introduction to the topic. Third, the knitr package, allows you to “knit” R code, and its output, with text descriptions.

10.4 R Markdown

Exercise 10.1 (R Markdown) As you read through this chapter, begin composing am Rmarkdown document that renders intoan HTML document. Think about how to properly format the output.

You can use either:

  1. The functions that you’ve composed as part of your chosen challenge, or
  2. The commands provided below, which take the built-in data frame PlantGrowth.
# Example text for your markdown file 

library(tidyverse)

# Descriptive statistics
PlantGrowth %>% 
  group_by(group) %>% 
  summarise(avg = mean(weight),
            stdev = sd(weight))

# Data visualization
ggplot(PlantGrowth, aes(x = group, y = weight)) +
  geom_jitter(width = 0.10, alpha = 0.5, shape = 16)

# ANOVA
plant_lm <- lm(weight ~ group, data = PlantGrowth)
anova(plant_lm)

There are three components to an R markdown file:

  1. The YAML header
  2. Non-code commentary
  3. Code chunks

10.5 The YAML Header

Every markdown file must begin with a YAML header, which specifies how the document should be compiled.13 The minimum YAML header contains:14

---
output: html_document
---

This simply means we want an HTML document (i.e. a stand-alone web-page), but you can also specify word_document or pdf_document15 instead. In addition, you can include more information in the header, e.g.:

---
title: "Plant Growth Analysis"
author: "Rick Scavetta"
date: "31/10/2021"
output: html_document
---

These additional fields specify contents in the header of the output document, regardless of the output type (Word, HTML, PDF) specified.

Exercise 10.2 (R Syntax) Open a new blank document in RStudio and change the type in the lower right corner to R Markdown. Begin your document by typing in your YAML header.

10.6 Non-code Commentary

This is where markdown comes into play. Mark down is a simplified version of a mark up language. For example, HTML and LATEX are markup languages which can be tedious to write or difficult to learn. In contrast, markdown is both easy to write and learn! Compare some common commands in table 10.1.

Table 10.1: Common LATEX and markdown formatting commands. $...$ denotes “math mode” where you can enter equations in place of ..., which is the same in LATEX as in markdown.
LATEX Markdown Result
\textit{} *italics* or _italics_ italics
\textbf{} **bold** or __bold__ bold
$E = mc^2$ $E = mc^2$ \(E = mc^{2}\)
$CO_2$ $CO_2$ \(CO_2\)
\\ 2 spaces Start a new line.

Some commands are actually the same, but because there are so many more options in LATEX, it can be overwhelming to learn. For example, to get a bulleted list, such as:

  • item 1
  • item 2
  • item 3

is pretty straight-forward in markdown:

- item 1
- item 2
- item 3

But in LATEX it would be written as:

\begin{itemize}
  \item item 1
  \item item 2
  \item item 3
\end{itemize}

In addition, you can also use # to add section headings, where the number of # denotes the level of header. Note the distinction to # as a comment character in R!

If you can’t figure out how to do something in markdown, you can still use LATEX, but you shouldn’t need to. If you want to use only , that is also possible. For example, This book was written with bookdown, an implementation of markdown.

Exercise 10.3 (Headers) Add section headers and some text to your document as per the template document.

10.7 Code Chunks

R code appears in code chunks. All code chunks have the basic structure:16

```{r}
```

This is like a signal to tell the compiler that the non-code commentary is over, and this part should be processed as R code.17

When the document is compiled, each chunk is executed sequentially. For example:

```{r}
log2(8)
```

will produce:

log2(8)
#> [1] 3

as output after compilation. There are a variety of chunk options, which control how each chunk will be handled. Table 10.2 lists some of the most common chunk options.

The chunk name is the first argument, and is not explicitly named. Character means any alpha-numeric combination. Recall, logical means either TRUE or FALSE.

Table 10.2: Most commonly used chunk options.
Option Type Description
Position 1 Unquoted character Name of the chunk
echo Logical Display the code
eval Logical Execute the code
cache Logical Cache the results
message Logical Show regular messages
warning Logical Show warning messages
error Logical Show error messages

For example, the following chunk is called calcLog and will show only the output:

```{r calcLog, echo = FALSE, eval = TRUE}
log2(8)
```

This chunk will only show the code, but won’t calculate anything:

```{r calcLog, echo = TRUE, eval = FALSE}
log2(8)
```

Chunk options may be defined globally, which means that all chunks will have the same options set. This is done by calling

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

in your first chunk. This calls the function opts\_chunk\$set\{\} within (i.e. ::) the knitr package and sets the argument echo to FALSE. That means for every chunk, the code will not be shown. Now

```{r}
log2(8)
```

will just produce the output and not show the code:

#> [1] 3

Chunk options can also be set locally. Local chunk options are specified in the chunk itself and override the global settings. For example

```{r, echo = FALSE}
log2(8)
```

will hide the commands, even if the global option is set to TRUE.

10.8 In-line Code

Code chunks, as defined above, enter a blank line before and after the code. If you want to integrate in-line code, i.e. in the middle of a sentence, you can use the short cut

`r ...`

replacing the … with your R code. This works best when the output is a vector.

10.9 Tables

There are several functions that can convert data sets or the results of statistical tests into nicely formatted tables. The style sometimes depends on the output type you’ve specified in the YAML header. For LATEX, I prefer xtable, which allows for subtle control. For markdown, the easiest entry points are the pander() function in the pander package, or the kable() function in the knitr package. For HTML, the DT package is the most flexible option.

Sometimes you’ll want to generate a table manually. For this, tablesgenerator.com allows you to format text into LATEX or markdown - just copy and past the results into your document.

10.10 The knitr package

R markdown files are saved using the .Rmd extension.18 knitr (Think “knit-R.”) is the package which assembles the three parts of an R markdown document. Running knitr::knit() on your file will produce the output file, but the easiest, and most common way, is to simply use the “knit” button in RStudio19.

As a last note, you can produce an output file from a regular R script using knitr::spin() or using the “notebook” button in RStudio.

The following web-sites provide a good starting point for using knitr and R Markdown:

  • knitr: Elegant, flexible and fast dynamic report generation with R
  • Using R Markdown with RStudio

Exercise 10.4 (Complete your chosen challenge) Now that you have some experience making a simple Markdown document, contine with the next section and introduce parameters to make your documents flexible!

10.11 Parametrized reports

Parametrized RMarkdown documents allow you so specify specific arguments for your analysis. In the case of our examples, you’ll be able to specify things like the diamond clarity, country or movie name.

Exercise 10.5 (Include parameters) For our exercises here, make your markdown document parametrized by allowing the user to, e.g.:

  • Exclude a specific feed type from the analysis
  • Choose if they want to include the 95% CI or the standard deviation in the plot
  • Plot the within-group Z-scores instead of the raw values

To accomplish this, you’ll need to do three things:

  1. Declare parameters at the beginning of the RMarkdown document,
  2. Access the parameters within the RMarkdown document, and
  3. Define the value of the parameters when compiling the RMarkdown document.

10.12 Declaring parameters

Use the params field in the YAML header to specify your parameters. Each parameter is separated by a new line.

---
title: "Movie Analysis"
output: html_document
params:
  name: Gone with the Wind
  year: 1939
---

This takes all your favorite user-defined atomic vector types: logical, integer, double, and character. You can also specify R objects as such:

---
title: "Movie Analysis"
output: html_document
params:
  name: Gone with the Wind
  year: 1939
  date: !r Sys.Date()
---

R expressions in the YAML header are are executed before any code in the document. That means you have to explicitly state package dependencies using :: notation, as such: !r lubridate::today().

10.13 Accessing parameters

Parameters are stored as named elements in a read-only list called params. Thus they can be accessed as such:

params$name
params$year

10.14 Defining parameter values

You can set the values and make use of parameters in three ways:

  1. Using the Knit button within RStudio.
  2. rmarkdown::render() with the params argument.
  3. Using an interactive user interface to input parameter values.

The first use R in interactive mode, which is convenient, but not reproducible. The third method is programmable, which is both convenient and reproducible.

10.14.1 The Knit button

Clicking the Knit button in RStudio will take the default values listed in the YAML metadata, if specified.

10.14.2 The interactive user interface

Alternatively, you can use the pull down menu on the Knit button to specify the values for each parameter by choosing Knit with Parameters.

The input controls for different types of parameters can be customized by specifying additional sub-items within the parameter specification in YAML.

Adapting our above example to include some settings:

---
title: My Document
output: html_document
params:
  year:
    label: "Year"
    value: 2017
    input: slider
    min: 2010
    max: 2018
    step: 1
    sep: ""
  name:
    label: "Movie:"
    value: Gone with the Wind
    input: select
  printcode:
    label: "Display Code:"
    value: TRUE
---

10.14.3 Knit with custom parameters

The most flexible way is to define parameters using values given to rmarkdown::render():

rmarkdown::render("MyDocument.Rmd", params = list(
  name = "Matrix, The"
))