6 Project challenges
For this workshop you should choose one fo the following projects to focus on. These projects will allow us to explore some tidyverse functions and test your data viz skills.
The three projects are described below. For each project, you’re asked to write three functions, one for plotting, one for a query, and another for producing a print out of text.
You are not tasked with writing these functions now! At the moment, try to produce the output of the following functions without having an explicit function. We’ll develop that as the course countinues.
Use the message board for your cohort to discuss which project you’ve chosen. Coordinate with colleagues during the week, over zoom calls or in person, to work on the project together.
6.1 Challenge 1: gapminder
The Gapminder global health statistics data set, found in the gapminder
package is the basis of this challenge. Produce the following thee types of functions:
Type | Definition | Description |
---|---|---|
Plotting | plotCountry(name = NULL) |
Given a Country, plot the life expectancy vs the \(log_{10}\) GDP for all countries. The country of interest movie should be highlighted as point colored according to year. |
Query | getDeltaLE(name = NULL) |
Given a country, return the increase in years of life expectancy between 1952 and 2007. |
Printout | printLE(name = NULL) |
Given a country and year, return a statement that tells us the life expectancy in that year and the average for the entire year. What percentile is this in the whole distribution? |
6.2 Challenge 2: ggplot2movies
The IMDB movie reviews data set, found in the ggplot2movies
package is the basis of this challenge. Produce the following thee types of functions:
Type | Definition | Description |
---|---|---|
Plotting | plotMovie(name = NULL) |
Given a movie title, return a plot showing the rating versus the length of all movies in the year that movie was released. The movie of interest should be highlighted as a colored point. |
Query | getYear(name = NULL) |
Given a movie title, return the release year. |
Printout | calcPerc(name = NULL) |
Given a movie title, return a statement that tells us the rating and the average rating of all movies in that year. What percentile is this in the whole distribution? |
6.2.1 Example: “Gone with the Wind”
getYear("Gone with the Wind")
#> [1] 1939
plotMovie("Gone with the Wind")
calcPerc("Gone with the Wind")
#> [1] "Gone with the Wind had a rating of 8.1, compared to the average for all movies in, 1939 which was6.3504132231405. This is the 95.8677685950413th percentile."
6.3 Challenge 3: ggplot2
The diamonds data set, found in the ggplot2
package is the basis of this challenge. Produce the following thee types of functions:
Functions to define in this package:
Type | Definition | Description |
---|---|---|
Plotting | plotClarity(name = NULL) |
Given a diamond’s clarity, return a plot showing the price versus the carat of all diamonds with this clarity. |
Query | getYear(name = NULL) |
Given a diamond’s clarity, return a data frame containing a sample of 10 randomly-chosen observations with this clarity. |
Printout | calcPerc(name = NULL) |
Given a diamond’s clarity, return a statement that tells us the cheapest and most expensive price. What percentile is the most expensive diamond of this clarity type in the whole dataset? |
6.3.1 Example clarity type “IF”:
plotClarity(clarity = "IF")
getClarityDF(clarity = "IF")
#> # A tibble: 10 × 10
#> carat cut color clarity depth table price x y
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 1.28 Premium E IF 59.8 59 15806 7.1 7.07
#> 2 0.35 Ideal G IF 61.3 56 910 4.51 4.53
#> 3 0.56 Very G… D IF 62.5 59 4025 5.26 5.3
#> 4 0.27 Ideal F IF 61.8 55 760 4.14 4.21
#> 5 0.56 Very G… E IF 59.9 60 2724 5.31 5.38
#> 6 1.09 Very G… F IF 61.1 58 10663 6.63 6.69
#> 7 0.35 Ideal J IF 61.8 55 569 4.54 4.56
#> 8 2.29 Premium J IF 61.4 60 18594 8.49 8.45
#> 9 0.32 Ideal J IF 61 57 521 4.42 4.46
#> 10 0.53 Ideal G IF 62.7 56 2137 5.23 5.13
#> # … with 1 more variable: z <dbl>
printClarity(clarity = "IF")
#> [1] "The diamonds of IF clarity range from $369 - $18806, compared to the average for all diamonds, 3932.79972191324. The most expensive diamond is at the 99.9962921764924th percentile."
6.3.2 Example clarity type “SI2”:
plotClarity(clarity = "SI2")
getClarityDF(clarity = "SI2")
#> # A tibble: 10 × 10
#> carat cut color clarity depth table price x y
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 0.26 Ideal H SI2 62.5 53 362 4.09 4.13
#> 2 1.02 Premium E SI2 59.9 58 4478 6.6 6.55
#> 3 0.43 Very G… E SI2 61.8 60 688 4.83 4.85
#> 4 1.6 Premium H SI2 62.4 60 9229 7.48 7.43
#> 5 1.01 Premium I SI2 62.6 58 3818 6.43 6.38
#> 6 0.92 Fair H SI2 66.1 57 2283 6.22 6
#> 7 0.7 Premium D SI2 60.8 61 1987 5.69 5.64
#> 8 2.11 Premium D SI2 60.9 60 18575 8.28 8.21
#> 9 0.76 Very G… F SI2 63.3 55 2347 5.79 5.83
#> 10 2.17 Very G… J SI2 60.8 59 13782 8.37 8.41
#> # … with 1 more variable: z <dbl>
printClarity(clarity = "SI2")
#> [1] "The diamonds of SI2 clarity range from $326 - $18804, compared to the average for all diamonds, 3932.79972191324. The most expensive diamond is at the 99.9944382647386th percentile."