6 Project challenges

For this workshop you should choose one fo the following projects to focus on. These projects will allow us to explore some tidyverse functions and test your data viz skills.

The three projects are described below. For each project, you’re asked to write three functions, one for plotting, one for a query, and another for producing a print out of text.

You are not tasked with writing these functions now! At the moment, try to produce the output of the following functions without having an explicit function. We’ll develop that as the course countinues.

Use the message board for your cohort to discuss which project you’ve chosen. Coordinate with colleagues during the week, over zoom calls or in person, to work on the project together.

6.1 Challenge 1: `gapminder`

The Gapminder global health statistics data set, found in the gapminder package is the basis of this challenge. Produce the following thee types of functions:

Type	Definition	Description
Plotting	`plotCountry(name = NULL)`	Given a Country, plot the life expectancy vs the \(log_{10}\) GDP for all countries. The country of interest movie should be highlighted as point colored according to year.
Query	`getDeltaLE(name = NULL)`	Given a country, return the increase in years of life expectancy between 1952 and 2007.
Printout	`printLE(name = NULL)`	Given a country and year, return a statement that tells us the life expectancy in that year and the average for the entire year. What percentile is this in the whole distribution?

6.1.1 Example: “Germany”

plotCountry("Germany")

getDeltaLE("Germany")
#> [1] 11.906

printLE("Germany", 1952)
#> [1] "The life expectancy of Germany in 1952 was 67.5 years, compared to the average for all countries, 49.0576197183099. This is the 91.5492957746479th percentile."

6.1.2 Example: “Afghanistan”

plotCountry("Afghanistan")

getDeltaLE("Afghanistan")
#> [1] 15.027

printLE("Afghanistan", 2007)
#> [1] "The life expectancy of Afghanistan in 2007 was 43.828 years, compared to the average for all countries, 67.0074225352113. This is the 5.63380281690141th percentile."

6.2 Challenge 2: `ggplot2movies`

The IMDB movie reviews data set, found in the ggplot2movies package is the basis of this challenge. Produce the following thee types of functions:

Type	Definition	Description
Plotting	`plotMovie(name = NULL)`	Given a movie title, return a plot showing the rating versus the length of all movies in the year that movie was released. The movie of interest should be highlighted as a colored point.
Query	`getYear(name = NULL)`	Given a movie title, return the release year.
Printout	`calcPerc(name = NULL)`	Given a movie title, return a statement that tells us the rating and the average rating of all movies in that year. What percentile is this in the whole distribution?

6.2.1 Example: “Gone with the Wind”

getYear("Gone with the Wind")
#> [1] 1939

plotMovie("Gone with the Wind")

calcPerc("Gone with the Wind")
#> [1] "Gone with the Wind had a rating of 8.1, compared to the average for all movies in, 1939 which was6.3504132231405. This is the 95.8677685950413th percentile."

6.2.2 Example: “The Matrix”

getYear("Matrix, The")
#> [1] 1999

plotMovie("Matrix, The")

calcPerc("Matrix, The")
#> [1] "Matrix, The had a rating of 8.5, compared to the average for all movies in, 1999 which was5.63710430721328. This is the 98.0799169693825th percentile."

6.3 Challenge 3: `ggplot2`

The diamonds data set, found in the ggplot2 package is the basis of this challenge. Produce the following thee types of functions:

Functions to define in this package:

Type	Definition	Description
Plotting	`plotClarity(name = NULL)`	Given a diamond’s clarity, return a plot showing the price versus the carat of all diamonds with this clarity.
Query	`getYear(name = NULL)`	Given a diamond’s clarity, return a data frame containing a sample of 10 randomly-chosen observations with this clarity.
Printout	`calcPerc(name = NULL)`	Given a diamond’s clarity, return a statement that tells us the cheapest and most expensive price. What percentile is the most expensive diamond of this clarity type in the whole dataset?

6.3.1 Example clarity type “IF”:

plotClarity(clarity = "IF")

getClarityDF(clarity = "IF")
#> # A tibble: 10 × 10
#>    carat cut     color clarity depth table price     x     y
#>    <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl>
#>  1  1.28 Premium E     IF       59.8    59 15806  7.1   7.07
#>  2  0.35 Ideal   G     IF       61.3    56   910  4.51  4.53
#>  3  0.56 Very G… D     IF       62.5    59  4025  5.26  5.3 
#>  4  0.27 Ideal   F     IF       61.8    55   760  4.14  4.21
#>  5  0.56 Very G… E     IF       59.9    60  2724  5.31  5.38
#>  6  1.09 Very G… F     IF       61.1    58 10663  6.63  6.69
#>  7  0.35 Ideal   J     IF       61.8    55   569  4.54  4.56
#>  8  2.29 Premium J     IF       61.4    60 18594  8.49  8.45
#>  9  0.32 Ideal   J     IF       61      57   521  4.42  4.46
#> 10  0.53 Ideal   G     IF       62.7    56  2137  5.23  5.13
#> # … with 1 more variable: z <dbl>

printClarity(clarity = "IF")
#> [1] "The diamonds of IF clarity range from $369 - $18806, compared to the average for all diamonds, 3932.79972191324. The most expensive diamond is at the 99.9962921764924th percentile."

6.3.2 Example clarity type “SI2”:

plotClarity(clarity = "SI2")

getClarityDF(clarity = "SI2")
#> # A tibble: 10 × 10
#>    carat cut     color clarity depth table price     x     y
#>    <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl>
#>  1  0.26 Ideal   H     SI2      62.5    53   362  4.09  4.13
#>  2  1.02 Premium E     SI2      59.9    58  4478  6.6   6.55
#>  3  0.43 Very G… E     SI2      61.8    60   688  4.83  4.85
#>  4  1.6  Premium H     SI2      62.4    60  9229  7.48  7.43
#>  5  1.01 Premium I     SI2      62.6    58  3818  6.43  6.38
#>  6  0.92 Fair    H     SI2      66.1    57  2283  6.22  6   
#>  7  0.7  Premium D     SI2      60.8    61  1987  5.69  5.64
#>  8  2.11 Premium D     SI2      60.9    60 18575  8.28  8.21
#>  9  0.76 Very G… F     SI2      63.3    55  2347  5.79  5.83
#> 10  2.17 Very G… J     SI2      60.8    59 13782  8.37  8.41
#> # … with 1 more variable: z <dbl>

printClarity(clarity = "SI2")
#> [1] "The diamonds of SI2 clarity range from $326 - $18804, compared to the average for all diamonds, 3932.79972191324. The most expensive diamond is at the 99.9944382647386th percentile."

5 Data Viz with ggplot2, themes

7 Writing Functions