Author: Ruixin Zhang

Module 09: Databases and SQL

  1. Prepare (due Mon 11/6)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 11/12) [LINK]
  4. Worked Example [LINK]

Content

09.A – Predictive Modeling and Regression

  1. Relational Database (24 min.)

09.B – Machine Learning and Classification

  1. SQL Querying (21 min.)
  2. SQL with Python and Pandas (12 min.)

Optional Supplements

Module 07: Statistical Inference

  1. Prepare (due Mon 10/16)
    1. Content below
    2. Canvas quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 10/22) [Link]
  4. Worked Example [Link]

Content

Note: the slides for this module have been updated. Please switch to the “slides” panel when viewing the video in Panopto. DO NOT stay on the “screen” panel, as the recorded screen showed the old slides (which contained typoes and old information).

07.A – Confidence Intervals and Bootstrapping

  1. Intro Confidence Intervals (17 min.)
  2. Confidence Intervals in Python (17 min.)
  3. Misconceptions about Confidence Intervals (short read)
    OR
    The 3rd paragraph (starting with “As a technical note…” in this link

07.B – Hypothesis Testing

  1. Intro Hypothesis Testing and Proportions (14 min.)
  2. Hypothesis Testing Means and More (33 min.)

Optional Supplements

You can access an excellent free online textbook on OpenIntro Statistics here, co-authored by Duke faculty. You can pay a suggested but adjustable price for a tablet-friendly pdf, but you can also just get the regular pdf for free. For Module 7, the following optional readings may be particularly helpful supplements:

  • Chapter 5.2 Confidence intervals for a proportion. This provides introductory material on confidence intervals elaborating on 5.A.1.
  • Chapter 5.3 Hypothesis testing for a proportion. This elaborates on the introduction to hypothesis testing from 5.B.1.
  • Chapters 7.1, 7.3, and 7.5 cover material from 5.B.2 on using t-tests for a single mean, the difference of two means, and many pairwise means respectively.
  • Chapter 6.3 discusses the chi-square test for categorical data introduced in 5.B.2.

In addition, here is the documentation for the scipy.stats library that implements most of the functionality described here as well as many other useful statistical functions.

Module 05: Probability

  1. Prepare (due Mon 9/25)
    1. Content below
    2. Sakai quizzes
  2. Video of the piece that got lost from Wednesday’s class
  3. Peer Instructions – See on the class forum
  4. Homework (due Sun 10/1) [Link]
  5. Worked Examples [Link]

Content (Slides in the Box folder)

5.A – Foundations of Probability (52 min.)

  1. Outcomes, Events, Probabilities (15 min.)
  2. Joint and Conditional Probability (11 min.)
  3. Marginalization and Bayes’ Theorem (15 min.)
  4. Random Variables and Expectations (11 min.)

5.B – Distributions of Random Variables (46 min.)

  1. Distributions, Means, Variance (19 min.)
  2. Monte Carlo Simulation (15 min.)
  3. Central Limit Theorem (12 min.)
    1. Slide 26 in the video has a typo that is fixed in the pdf version of the slides on Box. In the video, it says the probability is <= 0.95, but it should say < 0.05.

Optional Supplements

Helpful YouTube videos to understand nuance with examples

Online Textbook and Documentation

You can access an excellent free online textbook on OpenIntro Statistics here, co-authored by Duke faculty. You can pay a suggested but adjustable price for a tablet-friendly pdf, but you can also just get the regular pdf for free. For this module, the following optional readings may be particularly helpful supplements:

  • Chapter 3: Probability. This provides more information on many of the topics from the above videos in Foundations of Probability.
  • Chapter 4: Distributions of random variables. This provides much more information about particular classic distributions than is provided in 2B.B.1.
  • Chapter 5.1: Point estimates and sampling variability. This provides more information on some of the topics from 2B.B.2-3.

In addition, you can find documentation for the two pseudorandom number-generating / sampling libraries in python that we mentioned here:

Module 03: Visualization

  1. Prepare (due Mon 9/16)
    1. Content below
    2. Sakai quizzes
  2. Class engagement – See on the class forum
  3. Homework (due Sun 9/22) [Link]
  4. Worked Examples [Link]

Content

03.A – Data Visualization and Design

  1. Why Visualize? (11 min.)
  2. Basic Plot Types (17 min.)
  3. Dos and Don’ts (10 min.)

03.B – Visualization in Python

  1. Intro to Python Visualization Landscape (7 min.)
  2. Seaborn Introduction (17 min.)
  3. Seaborn Examples (17 min.)

Optional Supplements

Powered by WordPress & Theme by Anders Norén