Chapter 17 Protein.df Revisited

In the last exercises, including the previous chapter using tidy data, we worked on the protein.df data frame as a long list, not as actual tidy data. Let’s take a look at how we can maximize the use of our tidyverse functions.

17.1 Import and Clean Data

Exercise 17.1 (Import data) Fill in the blanks to:

  • Import in “Protein.txt”,
  • Convert it to a tibble,
  • Remove contaminants, and
  • Assign it to the object protein.df.

Use this as a template:

17.2 Process Intensities

Exercise 17.2 Using the dplyr functions, fill in the blanks to:

  • Transform all Intensity.* columns to log10. (mutate_at())
  • Do additions of the H+M and M+L columns as we have done previously, save the new columns as H.M, M.L (mutate())
  • Select only the Uniprot, H.M and M.L. (select())
  • Make a tidy data set with three columns Uniprot, Ratio and Intensity (gather())
  • Assign this to an object called onlyInt

Use this as a template:

17.3 Process Ratios

Exercise 17.3 Using the dplyr functions, fill in the blanks to:

  • Using the dplyr functions, select Uniprot and all columns that begin with Rat, but that do not end in Sig. (select())
  • Make a tidy data set with three columns Uniprot, Ratio and Expression (gather())
  • Remove all observations where Ratio is Ratio.H.L. (filter())
  • Rename the levels in Ratio to be M.L and H.M. (mutate() and recode_factor())
  • Group according to Ratio.
  • Apply a transformation to log2 transform all Expression values and then shift all values to be centered on zero. (mutate()).
  • Assign this to onlyRatios.

Use this as a template:

17.4 Process Significance Values and Merge

Exercise 17.4 Using the dplyr functions, fill in the blanks to:

  • Select Uniprot and all columns that end in Sig. (select())
  • Make a tidy data set with three columns Uniprot, Ratio and Significance (gather())
  • Remove all observations where Ratio is Ratio.H.L. (filter())
  • Rename the levels in Ratio to be M.L and H.M. (mutate() and recode_factor())
  • In the same function, make a new variable SigCat that cuts up the Significance variable into groups according to c(-Inf, 1e-11, 1e-4, 0.05, Inf) and labels them as c("<1e-11", "<0.0001", "<0.05", "NS").
  • Merge all this with the onlyRatios data frame (full_join())
  • Merge all this with the onlyInt data frame (full_join())
  • Remove any incomplete observations (i.e. with an NA anywhere) and where Uniprot is empty. (filter() and complete.cases(.))
  • Arrange in descending order of Significance. (arrange() and desc())
  • Assin to the object allData

Use this as a template: