Chapter 9 Exercises
Now that we’re familiar with some of the basic concepts of functional and object-oriented programming, let’s work on some real data!
9.1 Our Dataset
We already introduced the martians
dataset when we set out to do some basic statistics in R. Here, we’ll add a popular dataset called diamonds
from the {ggplot2}
package, which you’ve already installed as part of the {tidyverse}
. These are the commands to obtain our data.
# From our class repo
<- read_tsv("data/martians.txt") martians
## Rows: 20 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): Site, Nose, Antennae, AgeIndex, EyeIndex
## dbl (11): Group, Height, Height_UnequalVar8, Height_UnequalVar, BMI, Aptitud...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# From ggplot2
data(diamonds)
diamonds
contains data on over 50,000 diamonds
The data set has the following variables:
Variable | Description |
---|---|
price |
Price in US dollars ($326 – $18,823) |
carat |
Weight of the diamond (0.2 – 5.01) |
cut |
Quality of the cut (Fair, Good, Very Good, Premium, Ideal) |
color |
Diamond colour, from D (best) to J (worst) |
clarity |
A measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
x |
Length in mm (0 – 10.74) |
y |
Width in mm (0 – 58.9) |
z |
Depth in mm (0 – 31.8) |
depth |
Total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79) |
table |
Width of top of diamond relative to widest point (43–95) |
Let’s begin exploring our data by looking at some basic plots and doing some transformations.
9.2 Exercises I
In this section we’ll look at the diamonds
dataset. To do these exercises, use the tools we discussed in the chapters on logical expressions and indexing, plus the functions from the tidyverse that we presented, like filter()
and of course []
.
As always you are free to use whatever resources you have found to be beneficial for you. If you come up with something that we didn’t discuss in class, we can review it.
Exercise 9.4 (Summarizing proportions) - What proportion of the whole is made up of each category of clarity?
9.3 Exercises II
In this section we’ll return to the martians
dataset.
height
variable into lower and upper halfs. Calculate the proportion of blue and red-nosed Martians in each half. Do you think there is a real difference in the proportion of nose colors between the lower and upper 50% of the sample?
Eye
) for each age group in AgeIndex
.
Exercise 9.11 (Transformations and ) - In the Statistical Literacy reference material, we described a scenarion where each Martians fastest time to ran 100 meters was measured, first on Mars and then on Earth. Thus, the data is paired, the same individual was measured before and after a treatment was given (here, changing planets).
- Calculate a paired two-sample t-test when using speed described by height.
- Calculate the difference in running time between the two locations for each Martian
- Use this variable to perform a 1-sample t-test using
t.test()