All posts by Dr Kristin Stephens-Martinez, Ph.D.

Mini-Exam 1

This post outlines what Mini-Exam 1 will be like.

Exam Logistics

  • Modules covered: 1, 2, and 3
  • The exam will be take-home. It is open book, open note, open internet, but closed to people.
    • This means you cannot communicate with a person about the exam, including asking someone through the Internet (like stackoverflow) for help and receiving help.
  • Timeframe: It will open Thursday 2/10, 12:01 AM, and close Saturday 2/12, 11:59 PM.
    • The exam will close at 11:59 pm regardless of when you started.
  • The exam consists of 2 parts.
    • Part 1 consists of only a Jupyter Notebook.
    • Part 2 consists of a Jupyter Notebook and a data set.
    • You will get the zip files inside a Sakai Quiz.
    • You will submit them on Gradescope.
    • You will have 2 hours for each exam part.
      • We do not expect you to need the entire 2 hours for each part, however, it is not uncommon to get lost in a data set and we wanted to account for that.
    • You can rely on the Sakai Quiz timer to tell you how much time you have left.
    • We will use your logged start time in Sakai to track if you submitted on Gradescope on time.
      • If you submit after your allotted time, we will use the last submission within your allotted time. That includes marking it as zero if you do not submit within your time limit (so you will need to rely on the retake for your exam).
      • We recommend you submit to Gradescope periodically (after each problem) so you are not scrambling at the end trying to open Gradescope.
    • You do not need to do anything with Sakai after you retrieve your zip file from the quiz.
    • During your testing period, you can submit as many times as you want to Gradescope. We will take the submission you mark as active, which is your last submission unless you change it using the history.
  • The exam must be done individually. It is a violation of class policy if you collaborate in any way with another person (in or not in the class) on the exam. You can only talk to the teaching staff about the exam.
  • Protect the integrity of the exam and your exam submission.
    • Do not talk to anyone about the exam during the exam period.
    • Take your exam in a secure location where no one can bother you.
    • Take your exam in a place where you will not be distracted or tempted to talk to someone.
  • If you have a question during the exam, ask it as a private new message on the class forum. Or on Zoom if a teaching staff member is on call at that time.
    • We will do our best to always have someone checking the forum, however, we cannot make promises someone will instantly answer your question.
    • The exam is tested for readability, so the wording should be straightforward.
  • The Mini-Exam Retake 1 will be during Mini-Exam 2. Your Mini-Exam 1 score will be the max between this exam and the retake.

Grading Scale and Points Allocation

Each section will be graded on a four-step rubric scale as follows.

  • E (Exemplary) – Work that meets all requirements and displays full mastery of all learning goals and material.
  • S (Satisfactory) – Work that meets all requirements and displays at least partial mastery of all learning goals as well as full mastery of core learning goals.
  • N (Not yet) – Work that does not meet some requirements and/or displays developing or incomplete mastery of at least some learning goals and material.
  • U (Unassessable) – Work that is missing, does not demonstrate meaningful effort, or does not provide enough evidence to determine a level of mastery.

There are ~100 points possible and fewer than 10 questions. The number of points earned are evenly distributed across the problems based on the number of concepts they are testing. The rubric will be converted to points as follows:

  • E = full credit
  • S = E_full_credit – 1
  • N = E_full_credit * 0.6
  • U = E_full_credit *0.2
  • Blank = 0

This scheme ensures that earning an E or S on all problems ensures an A. While a single U means an A is very unlikely, which is reasonable since a U on a problem clearly shows a lack of mastery on all the content for this exam.

Module 04: Data Wrangling

  1. Prepare (due M 1/31)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Su 2/6)
  4. Worked Example

Content (Slides in the Box folder)

4.A – What is Wrangling

  1. Data sources, formats, and importing (26 min.)
  2. Common data cleaning problems (16 min.)
  3. Read Section 3.4 Handling Missing Data from Python Data Science Handbook

4.B – Wrangling Text

  1. Python string operations (16 min.)
  2. Introduction to regular expressions (18 min.)
  3. Read Section 3.10 Vectorized String Operations from Python Data Science Handbook

Optional Supplements

Module 03: Probability

  1. Prepare (due M 1/24)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Su 1/30)
  4. Worked Examples

Content (Slides in the Box folder)

3.A – Foundations of Probability (52 min.)

  1. Outcomes, Events, Probabilities (15 min.)
  2. Joint and Conditional Probability (11 min.)
  3. Marginalization and Bayes’ Theorem (15 min.)
  4. Random Variables and Expectations (11 min.)

3.B – Distributions of Random Variables (46 min.)

  1. Distributions, Means, Variance (19 min.)
  2. Monte Carlo Simulation (15 min.)
  3. Central Limit Theorem (12 min.)
    1. Slide 26 in the video has a typo that is fixed in the pdf version of the slides on Box. In the video, it says the probability is <= 0.95, but it should say < 0.05.

Optional Supplements

You can access an excellent free online textbook on OpenIntro Statistics here, co-authored by Duke faculty. You can pay a suggested but adjustable price for a tablet-friendly pdf, but you can also just get the regular pdf for free. For this module, the following optional readings may be particularly helpful supplements:

  • Chapter 3: Probability. This provides more information on many of the topics from the above videos in Foundations of Probability.
  • Chapter 4: Distributions of random variables. This provides much more information about particular classic distributions than is provided in 2B.B.1.
  • Chapter 5.1: Point estimates and sampling variability. This provides more information on some of the topics from 2B.B.2-3.

In addition, you can find documentation for the two pseudorandom number generating / sampling libraries in python that we mentioned here:

Module 02: Numpy & Pandas

  1. Prepare (due M 1/17)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions
    1. DataFrame Indexing: Round 1, Round 2
    2. Series Adding: Round 1, Round 2
    3. hstack/vstack: Round 1, Round 2
    4. Slicing: Round 1, Round 2
  3. Homework (Su 1/23)
  4. Worked Example

Content (Slides in the Box folder)

2.A – Numpy (1 hour)

  1. Why Numpy (8 min.)
  2. Numpy Array Basics (15 min.)
  3. Numpy Universal Functions (20 min.)
  4. Numpy Axis (14 min.)

2.B – Pandas (45 min.)

  1. Why Pandas (7 min.)
  2. Pandas Series (19 min.)
  3. Pandas Dataframe (21 min.)

Optional Supplements

Module 01: What is Data Science, Anaconda, Python, & Jupyter

  1. Prepare (due M 1/10 )
    1. Content below
    2. Quiz is on Sakai
    3. Install Anaconda
  2. Peer Instructions (these will open when we use them)
    1. lambda with min/max: Round 1, Round 2
    2. Sorting: Round 1, Round 2
    3. Notebooks I: Round 1, Round 2
    4. Notebooks II: Round 1, Round 2
  3. Homework (due Su 1/16)

Content (Slides in the Box folder)

1.A – What is Data Science? (in-class on 1/7 or see recording)

1.B – Python3 (12 min.)

  1. Python vs. Java (3 min.)
  2. Data Types (2 min.)
  3. Iteration, Functions, Classes (7 min.)

1.C – Python for Data Science

  1. Anaconda and Jupyter (10 min.)
  2. Jupyter Notebook Demo (11 min.)

Optional Supplements