Monthly Archives: September 2021

Module 6: Sampling, Simulation, Hypothesis Testing, Comparing Distributions, Decisions and Uncertainty, and A/B Testing

Note: Due to fall break, there will be no class Tuesday 10/5 and this module spans 1.5 weeks. This means the amount of content is also a little longer than usual, so plan accordingly.

  1. Videos
  2. Textbook (supplemental)
  3. Homework (Due Tuesday 10/5, late 10/6)
    1. Part 1: Sampling and Simulation
    2. Part 2: Hypothesis Testing
    3. Part 3: Comparing Distributions
    4. Part 4: Decisions and Uncertainty
    5. Part 5: A/B Testing
  4. Group Worksheet
  5. Lab 06 (In containers, Due Friday 10/15)


Part 1a: Sampling

  1. Probability & Sampling (2:28)
  2. Sampling (6:47)

Part 1b: Simulation

  1. Distributions (3:48)
  2. Large Random Samples (5:25)
  3. Simulation (2:53)
  4. Statistics (7:25)

Part 2: Hypothesis Testing

  1. Assessing Models (3:12)
  2. A Model about Random Selection (13:58)
  3. A Genetic Model (15:44)
  4. Example (Optional, 3:59)

Part 3: Comparing Distributions

  1. Introduction (6:03)
  2. Total Variation Distance (12:54)
    1. Alternatively, you can read the following sections in Ch 11.2: Comparison with Panels Selected at Random and A New Statistic: The Distance between Two Distributions.  Your main goal is to understand what the total variation distance (TVD) statistic is.
  3. Assessment (Optional, 15:32)
  4. Summary (2:48)

Part 4: Decisions and Uncertainty

  1. Introduction and Terminology (10:31)
  2. Performing a Test (11:58)
    1. Alternatively, you can read Ch 11.3’s section The GSI’s Defense. The main point of this video is to see that sometimes it’s not straightforward in whether or not to reject the null hypothesis. Below is a histogram of the averages when simulating a section’s average grade given the data. The red is where a section is claiming that their average was lower than is consistent with the overall data. But is it? How small is small enough to reject the hypothesis?
  3. Statistical Significance (11:06)
  4. An Error Probability (8:55)
  5. Origin of the Conventions (Optional, 4:08)

Part 5a: A/B Testing

  1. Introduction (9:59)
  2. Hypotheses and Statistic (4:39)
  3. Performing the Test (15:59)

Part 5b: Deflategate Example (Optional)

  1. Deflategate Introduction (Optional, 12:59)
  2. Deflategate Testing (Optional, 11:09)


Group Reflection 01

With Project 1 almost over, it is time to reflect on how working in your group went. You will find the reflection form on Gradescope starting on 9/28 (the day after the project is due) and it is due Friday 10/01, 11:59 pm, with a late submission accepted until 10/03, 11:59 pm.

The purpose of the reflection is to help you, your future group, and Prof. Stephens-Martinez to better understand how to make groups better and whether the group contract was a worthwhile exercise.

Module 5: Iteration and Probability

  1. Videos
  2. Textbook (supplemental)
  3. Homework (Due Sunday 9/26, late 9/27)
    1. Part 1: Iteration
    2. Part 2: Probability
  4. Group Worksheet
  5. Lab 05 (Due Friday 10/01)


Part 0: Table Examples (Optional)

  1. Table Method Review (Optional, 7:17)
  2. Discussion Question (6:20)
  3. Old Midterm Question (7:46)
  4. Advanced Where (9:00)

Part 1: Iteration

  1. Comparison (6:16)
  2. Predicates (2:08)
  3. Random Selection (4:55)
  4. Random Selection Discussion (4:15)
  5. Print (2:43)
  6. Control Statements (6:06)
  7. For Statements (10:09)

Part 2: Probability

  1. Monty Hall Problem (12:47)
  2. Probability (1:43)
  3. Multiplication Rule (3:45)
  4. Addition Rule (1:28)
  5. Probability Example (1:59)


Module 4: Groups, Pivots, and Joins

  1. Videos
  2. Textbook (supplemental)
  3. Homework (Due Sunday 9/19, late 9/20)
    1. Part 1a: Groups
    2. Part 1b: Pivot Tables
    3. Part 2: Joins
  4. Group Worksheet
  5. Lab 04 (In containers, Due Friday 9/24)


Part 1a: Groups

  1. One Attribute Group (14:39)
  2. Cross Classification (11:08)
  3. Example 1 (Optional, 5:04)

Part 1b: Pivot Tables

  1. Pivot Tables (13:09)
  2. Example 2 (Optional, 5:35)
  3. Comparing Distributions (12:00

Part 2: Joins

  1. Joins (10:28)
  2. Bikes (9:41)
  3. Shortest Trips (4:59)
  4. Maps (9:36)


Exam 1

This post outlines what exam 1 will be like. Because the format is likely new to most people we will have a mock exam during class Tuesday 9/14 to learn what the process and format will be like.

Exam Logistics

  • The exam will cover Modules 1 through 3 inclusive.
  • The exam will be take-home. It is open book, open note, open internet, but closed to people.
    • This means you cannot communicate with a person while taking the exam, including asking someone through the Internet (like stackoverflow) for help and receiving help.
  • Timeframe: It must be completed on Thursday 9/16 between 10:15 am (start of class) and 11:59 pm.
    • The exam will close at 11:59 pm regardless of when you started.
  • The exam has two parts: Multiple Choice and Jupyter Notebook.
    • You may take a break between each part.
    • Both parts are timed through Sakai.
  • The exam must be done individually. It is a violation of class policy if you collaborate in any way with another person (in or not in the class) on the exam. You can only talk to the teaching staff about the exam.
  • Protect the integrity of the exam and your exam submission.
    • Do not talk to anyone about the exam during the exam period.
    • Take your exam in a secure location where no one can bother you.
    • Take your exam in a place where you will not be distracted or tempted to talk to someone.
  • The exam has randomization elements in it so no one’s exam will be identical to another person’s.
  • If you have a question during the exam, ask it as a private new message on the class forum. Or on Zoom if a teaching staff member is on call at that time.
    • We will do our best to always have someone checking the forum, however, we cannot make promises someone will instantly answer your question.
    • Prof. Stephens-Martinez will be in the class Zoom during class time and in her office hours zoom during her office hours that are immediately after Thursday’s class.
    • David has office hours Thursday 5-6 pm ET.
    • The exam is tested for readability, so the wording should be straightforward.

Multiple Choice Questions (30 minutes)

  • You will have 30 minutes to complete this part.
  • It will be a Sakai Quiz (like the homework).
  • You can submit only once.
  • You will not see your score until after the testing period is over.

Jupyter Notebook (30 minutes)

  • You will have 30 minutes to complete this part.
  • You will get your Jupyter Notebook zip file inside a Sakai Quiz that is not the multiple-choice part.
  • You will submit it on Gradescope.
  • You can rely on the Sakai Quiz timer to tell you how much time you have left.
  • We will use your logged start time in Sakai to track if you submitted on Gradescope on time.
  • You do not need to do anything with Sakai after you retrieve your zip file from the quiz.
  • During your testing period, you can submit as many times as you want to Gradescope. We will take your last submission.
  • The autograder will tell you if your values are the correct type, but not necessarily if they are the correct value. There are hidden tests. Your score will only be revealed after we have finished all grading, including the manual grading part.

How to Prepare

  • First, do all of your assigned work. The homework and labs are there to help you learn the material. The exam is to check if you actually learned it. Therefore, if you did the homework and labs and ensured you understood it, you will do fine on the exam.
  • We will make copies of all of the homework for studying, you can submit to these an unlimited amount of times.
    • The best practice is to do the homework without looking at your notes.
    • Anything you got wrong is information on what you need to work on and focus on studying.
  • The class Box folder has unsolved versions of Lab 2 and 3. You can download and do these again.
    • By redoing them without looking at your prior solution, you can find out what you are struggling with. Any time you can’t easily rewrite the answer, that is telling you what you need to study more on.
  • If you are struggling with something, ask on the class forum or go to office hours. You can also answer questions on the class forum yourself, articulating an answer is a great way to check your learning!
  • [Optional] Work on Project 1. It spans modules 1-3, which is what the exam will cover. However, it is not due until 9/24 so you should not feel compelled to finish it before the exam.

Mock Exam Logistics

Please come to class. This way if any hiccups occur the teaching staff will be available to help you figure it out.

If you have an SDAO accommodation, you should see that accommodation reflected in this mock exam. If you do not, notify Prof. Stephens-Martinez immediately.

Project 1

The zip file will be in the class Box folder in the Project folder. You will submit this as a group, similar to the group contract. This covers modules 1 through 3.

To work collaboratively, you can choose to use Google Colab. Put the file in your Google Drive and share it with your group. When you open the file, it will open in Google Colab. You all should be able to work on the notebook at the same time. However, working within the same cell may not work. You may notice that the file locations for the data are over the internet, rather than local. This change is to make working with Colab easier, which does not hold onto the data files between uses.

Group Contract

To set expectations in the beginning your group will fill out a group contract.

  1. Go to the group contract template. Read and discuss as a group what you all will do.
  2. One of your group mates should click on this link to make a Google doc copy of the contract for your own group.
  3. Make sure all group mates can edit the Google doc and fill it out.
  4. Download a pdf of the group contract and submit it as a group to Gradescope.
    1. If you do not know how to add group members to an assignment, Gradescope has a help page for this.

Due: 9/10 11:59 pm