What is Data Science?

1.9 What is data?

1.10 What is science?

Lorem ipsum

1.11 Questions in data science

Questions regarding “the data?”

Data Provenence

  • What, literally, is “the data?”

  • What is its provenance?

    • Who measured it?
    • How did they measure it?
      • Which measurement instrument?
      • Which model?
      • Which software and which version?
    • Where did it come from?
    • When was it measured?
    • How precise and reproducible were the measurement devices?
    • Why was it originally measured?
    • Who paid for it to be collected and/or analyzed?
    • Did they make any implicit or explicit assumptions?
    • Are there any potentially confounding variables that merit attention?
    • What have they, or others, done with the data?
      • Do we agree with these results?
      • Are they relevant to our goals?
      • What other questions can this data reasonable answer?
    • Are there suspicious and or uninformative observations and or variables?
      • How would I know?
      • Given my previous knowledge about the topic, what would I expect to see?
    • Can a mathematical relationship between variables be defined?
      • Does it have any real-world value? Should I care?
      • Do one or more variables predict another?
      • Are they actually relevant, or is it just due to chance? Do I care?
      • Why would I want to do that? Does it make real-world sense?

Data Validation

  • Is this raw data? How could I tell if it isn’t?
  • Are there any signs of data munging (processing)?
  • Are there any signs of suspicious data manipulation?
    • Does it matter?
  • What is the best way to describe each variable?
  • Are there interesting clusters in the observations?
    • Is that relevant question?

Analytical Questions