Chapter 9 Exercises

Now that we’re familiar with some of the basic concepts of functional and object-oriented programming, let’s work on some real data!

9.1 Our Dataset

We already introduced the martians dataset when we set out to do some basic statistics in R. Here, we’ll add a popular dataset called diamonds from the {ggplot2} package, which you’ve already installed as part of the {tidyverse}. These are the commands to obtain our data.

# From our class repo
martians <- read_tsv("data/martians.txt")
## Rows: 20 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr  (5): Site, Nose, Antennae, AgeIndex, EyeIndex
## dbl (11): Group, Height, Height_UnequalVar8, Height_UnequalVar, BMI, Aptitud...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# From ggplot2
data(diamonds)

diamonds contains data on over 50,000 diamonds The data set has the following variables:

Variable Description
price Price in US dollars ($326 – $18,823)
carat Weight of the diamond (0.2 – 5.01)
cut Quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color Diamond colour, from D (best) to J (worst)
clarity A measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x Length in mm (0 – 10.74)
y Width in mm (0 – 58.9)
z Depth in mm (0 – 31.8)
depth Total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
table Width of top of diamond relative to widest point (43–95)
Exercise 9.1 (Examine structure) What type of data is contained in each column? Use some of the functions we introduced in the dataframes section to explore the basic structure of our new object.

Let’s begin exploring our data by looking at some basic plots and doing some transformations.

9.2 Exercises I

In this section we’ll look at the diamonds dataset. To do these exercises, use the tools we discussed in the chapters on logical expressions and indexing, plus the functions from the tidyverse that we presented, like filter() and of course [].

As always you are free to use whatever resources you have found to be beneficial for you. If you come up with something that we didn’t discuss in class, we can review it.

Exercise 9.2 (Counting individual groups) - How many diamonds with a clarity of category “IF” are present in the data-set?
Exercise 9.3 (Measuring proportions) - What fraction of the total do they represent?

Exercise 9.4 (Summarizing proportions) - What proportion of the whole is made up of each category of clarity?

Exercise 9.5 (Finding prices I) - What is the cheapest diamond price overall?
Exercise 9.6 (Finding prices I) - What is the range of diamond prices?
Exercise 9.7 (Finding prices II) - What is the average diamond price in each category of cut and color?

9.3 Exercises II

In this section we’ll return to the martians dataset.

Exercise 9.8 (Only tall martians) - Use the entire dataset for this exercise, i.e. Site I and Site II. Let the median divide the height variable into lower and upper halfs. Calculate the proportion of blue and red-nosed Martians in each half. Do you think there is a real difference in the proportion of nose colors between the lower and upper 50% of the sample?
Exercise 9.9 (Difference in means) - Calculate the mean and standard deviation of the eye-sight scores (Eye) for each age group in AgeIndex.
Exercise 9.10 (Measuring proportions) - Create a new data frame that only contains the values for the martians at site one. Use this dataset for the next exercise.

Exercise 9.11 (Transformations and ) - In the Statistical Literacy reference material, we described a scenarion where each Martians fastest time to ran 100 meters was measured, first on Mars and then on Earth. Thus, the data is paired, the same individual was measured before and after a treatment was given (here, changing planets).

  • Calculate a paired two-sample t-test when using speed described by height.
  • Calculate the difference in running time between the two locations for each Martian
  • Use this variable to perform a 1-sample t-test using t.test()