Monthly Archives: January 2022

Module 05: Statistical Inference

January 30, 2022ModuleChang Xu

Prepare (due M 2/7)
1. Content below
2. Sakai quizzes
Peer Instructions – See on the class forum
Homework (due Su 2/13)
Worked Example

Content

5.A – Confidence Intervals and Bootstrapping

Intro Confidence Intervals (17 min.)
Confidence Intervals in Python (17 min.)

5.B – Hypothesis Testing

Intro Hypothesis Testing and Proportions (14 min.)
Hypothesis Testing Means and More (33 min.)

Optional Supplements

You can access an excellent free online textbook on OpenIntro Statistics here, co-authored by Duke faculty. You can pay a suggested but adjustable price for a tablet-friendly pdf, but you can also just get the regular pdf for free. For Module 3B, the following optional readings may be particularly helpful supplements:

Chapter 5.2 Confidence intervals for a proportion. This provides introductory material on confidence intervals elaborating on 3B.A.1.
Chapter 5.3 Hypothesis testing for a proportion. This elaborates on the introduction to hypothesis testing from 3B.B.1.
Chapters 7.1, 7.3, and 7.5 cover material from 3B.B.2 on using t-tests for a single mean, the difference of two means, and many pairwise means respectively.
Chapter 6.3 discusses the chi-square test for categorical data introduced in 3B.B.2.

In addition, here is the documentation for the scipy.stats library that implements most of the functionality described here as well as many other useful statistical functions.

Module 04: Data Wrangling

January 22, 2022ModuleDr Kristin Stephens-Martinez, Ph.D.

Prepare (due M 1/31)
1. Content below
2. Sakai quizzes
Peer Instructions – See on the class forum
Homework (due Su 2/6)
Worked Example

Content (Slides in the Box folder)

4.A – What is Wrangling

Data sources, formats, and importing (26 min.)
Common data cleaning problems (16 min.)
Read Section 3.4 Handling Missing Data from Python Data Science Handbook

4.B – Wrangling Text

Python string operations (16 min.)
Introduction to regular expressions (18 min.)
Read Section 3.10 Vectorized String Operations from Python Data Science Handbook

Optional Supplements

Module 03: Probability

January 12, 2022ModuleDr Kristin Stephens-Martinez, Ph.D.

Prepare (due M 1/24)
1. Content below
2. Sakai quizzes
Peer Instructions – See on the class forum
Homework (due Su 1/30)
Worked Examples

Content (Slides in the Box folder)

3.A – Foundations of Probability (52 min.)

Outcomes, Events, Probabilities (15 min.)
Joint and Conditional Probability (11 min.)
Marginalization and Bayes’ Theorem (15 min.)
Random Variables and Expectations (11 min.)

3.B – Distributions of Random Variables (46 min.)

Distributions, Means, Variance (19 min.)
Monte Carlo Simulation (15 min.)
Central Limit Theorem (12 min.)
1. Slide 26 in the video has a typo that is fixed in the pdf version of the slides on Box. In the video, it says the probability is <= 0.95, but it should say < 0.05.

Optional Supplements

You can access an excellent free online textbook on OpenIntro Statistics here, co-authored by Duke faculty. You can pay a suggested but adjustable price for a tablet-friendly pdf, but you can also just get the regular pdf for free. For this module, the following optional readings may be particularly helpful supplements:

Chapter 3: Probability. This provides more information on many of the topics from the above videos in Foundations of Probability.
Chapter 4: Distributions of random variables. This provides much more information about particular classic distributions than is provided in 2B.B.1.
Chapter 5.1: Point estimates and sampling variability. This provides more information on some of the topics from 2B.B.2-3.

In addition, you can find documentation for the two pseudorandom number generating / sampling libraries in python that we mentioned here:

Python random – Base Python library
Numpy random – Numpy random sampling library

Module 02: Numpy & Pandas

January 9, 2022ModuleDr Kristin Stephens-Martinez, Ph.D.

Prepare (due M 1/17)
1. Content below
2. Sakai quizzes
Peer Instructions
1. DataFrame Indexing: Round 1, Round 2
2. Series Adding: Round 1, Round 2
3. hstack/vstack: Round 1, Round 2
4. Slicing: Round 1, Round 2
Homework (Su 1/23)
Worked Example

Content (Slides in the Box folder)

2.A – Numpy (1 hour)

Why Numpy (8 min.)
Numpy Array Basics (15 min.)
Numpy Universal Functions (20 min.)
Numpy Axis (14 min.)

2.B – Pandas (45 min.)

Why Pandas (7 min.)
Pandas Series (19 min.)
Pandas Dataframe (21 min.)

Optional Supplements

Numpy Beginner’s Tutorial
Chapter 2: Introduction to Numpy from Python Data Science Handbook
Numpy Documentation
10 Minute to Pandas Tutorial
Pandas User Guide
Chapter 3: Data Manipulation with Pandas from Python Data Science Handbook (just the first three subsections)

Module 01: What is Data Science, Anaconda, Python, & Jupyter

January 4, 2022ModuleDr Kristin Stephens-Martinez, Ph.D.

Prepare (due M 1/10 )
1. Content below
2. Quiz is on Sakai
3. Install Anaconda
Peer Instructions (these will open when we use them)
1. lambda with min/max: Round 1, Round 2
2. Sorting: Round 1, Round 2
3. Notebooks I: Round 1, Round 2
4. Notebooks II: Round 1, Round 2
Homework (due Su 1/16)

Content (Slides in the Box folder)

1.A – What is Data Science? (in-class on 1/7 or see recording)

1.B – Python3 (12 min.)

1.C – Python for Data Science

Anaconda and Jupyter (10 min.)
Jupyter Notebook Demo (11 min.)

CompSci216 Everything Data, Spring 2022

CompSci216 Everything Data, Spring 2022

Monthly Archives: January 2022

Module 05: Statistical Inference

Content

Optional Supplements

Module 04: Data Wrangling

Content (Slides in the Box folder)

Optional Supplements

Module 03: Probability

Content (Slides in the Box folder)

Optional Supplements

Module 02: Numpy & Pandas

Content (Slides in the Box folder)

Optional Supplements

Module 01: What is Data Science, Anaconda, Python, & Jupyter

Content (Slides in the Box folder)

Optional Supplements