Warming-up#
2025.02.25
Lecture outline#
Intro#
What does the term “data science” mean to you?
So we have two terms here:
Statistics
Machine learning (ML)
How are they different from each other? For example, what is the difference between a “statistical model” and a “machine learning model?” Or maybe they are the same thing? Just fancy words to be created to confuse people?
What is the difference between supervised and unsupervised learning?
Checking in your answers for Q4 in the Pre-course Quiz.
Data skills#
Here are things we don’t primarily focus on during the class but are essential for building up your data skills for reproducible science.
Programming
Collaboration and version control (Git & GitHub)
Jupyter (incl. Google Colab)
Data import and wrangling
Visualization
Basics of statistics#
Random variable: A value that follows a probability distribution. Are the following quantities random variables?
The number of apples in a basket
Your height
The height of a person sitting here who I randomly pick up
Now let’s revisit expectation, variance, and their mathematical expressions (cont. Q1 in Pre-course Quiz):
More about probability distributions#
Gaussian: many distributions converge to this thanks to the central limit theory
How does this relate to Q2 in the Pre-course Quiz?
Student t-distribution: Bounded to the mean
Beta and Gamma distributions: different supports for different kinds of environment variables
Group discussion & Demos#
Discuss how you would reproduce the figure for Q3 in the Pre-course Quiz.
Get the data necessary for doing Exercise 4.5 in Hsieh’s book.