All posts by Dr Kristin Stephens-Martinez, Ph.D.

(In-person) Exam Retakes

This post outlines what the in-person exam retakes will be like.

Tell us you are coming by filling out the retake exam via this form (by Monday 4/28; help save paper!)

  • When: Thursday 5/1, 2-5 pm
    • Exam 1 Retake: 2-3:15 pm
    • Exam 2 Retake: 3:45-5:00 pm
    • (We will likely have a wiggle room of 5 minutes around these times, but we will not start early)
  • Where: French Science 2231 (usual lecture room)
  • Your exam grade for each exam will be the max of the original and the retake.
  • There are two in-person exam retakes. One for each of the midterm exams.
  • Each retake will cover only the original modules covered by that exam. For that information, see the calendar or the original exam logistics posts.
  • You are allowed one helper sheet per exam. You can only have one out per exam if you bring different ones for each exam.
  • Each exam will be 75 minutes long.
    • If you have an SDAO accommodation, you need to schedule a time with the testing center. Multiply 75 minutes with your extra time and the number of exams you plan to retake. We will handle merging the exam pdfs into a single “exam” retake.
  • There will be no regrade window due to the necessary turnaround time for submitting grades. If you wish to discuss your grade, you must email Prof. Stephens-Martinez.

Specific information

  • Exam 1 Retake
  • Exam 2 Retake
    • The data set used for this exam is Seaborn’s planets data set
    • If this is the only exam you are retaking, DO NOT come into the room until after the Exam 1 Retake is done (So aim to arrive after 3:30). Please do not disturb your classmates.
    • See Exam 2 logistics for any other information.

Exam 02 Logistics: Practicum

This post outlines the Practicum 2 part of Exam 2. See the in-person Exam 2 or Practicum 2 Update posts for details on the other parts. Study exams are in the Box folder.

  • Modules: 06 – 08
  • When: Friday 4/4 12:01am to Saturday 4/5 11:59pm
    • There is no class on Friday.
    • It should take you (and your partner) around 3-6 hours to complete, but you can take as long as you want. It must be submitted before the end of the Saturday.
    • NOTE: The expected hours to complete is longer than Practicum 1’s estimate.
  • All other details are the same as Practicum 1’s logistics.

Exam 02 Logistics: In-Person Exam

This post outlines the in-person part of Exam 2. See the Practicum 2 or Practicum 2 Update posts for details on the other parts.

  • Modules covered: 06 – 08
  • When: Wednesday 4/2, during regular class time
  • A calculator is strongly recommended. But if you forget, you will be okay because no exact calculations are required.
  • The formulas you will need are printed on the exam’s front page. See an example on the study exam.
  • Code on the exam
    • The data set used for this exam is Seaborn’s diamonds data set. We recommend familiarizing yourself with the columns’ meanings.
    • It will have code reading (so know what these functions do), in particular:
      • The results of calling the describe function on a data set.
      • The results of a seaborn function call: groupby and pivot_table
  • All other details are the same as In-person Exam 1’s logistics, including:
    • You will not write code.
    • We will release a study exam and Canvas study quiz.
    • You may bring one piece of paper as a helper sheet and can put things on the front and back.

Grading Scale and Points Allocation

This is the same as Exam 1’s logistics in that problems are graded on an ESNU scale. How many points each rubric level is worth depends on the question. A question’s worth depends on how many questions in the exam are testing the same concept (more questions for that concept means fewer points for each question). The goal is that an exam with only S’s on every question results in a 90% because it indicates a Satisfactory level of understanding of all the concepts the exam is testing, rather than Exemplary.

Exam 02 Logistics: Practicum Update

This post outlines the Practicum 2 Update part of Exam 2. See the in-person Exam 2 or Practicum 2 posts for details on the other parts. Study exams are in the Box folder.

  • When: Thursday 4/10 – Saturday 4/12
    • Wednesday 4/9’s class will cover general feedback on Practicum 2.
    • Friday 4/11’s class is optional and will be for Project consulting or questions on Practicum 2. Prof. Stephens-Martinez will run it over Zoom and will be in her office if you want to ask her in person.
  • All other details are the same as Practicum 1 Update’s logistics.

Module 10: Deep Learning

  1. Prepare (due Monday 4/14)
    1. Content below
    2. Canvas quizzes
  2. Class engagement – See on the class forum
  3. Homework (due Sun 4/20, late due 4/23) [Link]
  4. There are no worked examples

Content

10 Deep Learning

  1. Neural Networks and Applications (16 min.)
  2. Forward Propagation (10 min.)
  3. Gradient Descent (14 min.)
  4. Back Propagation (11 min.)
  5. Convolutional Neural Network (15 min.)
  6. Introducing Pytorch (23 min.)

Optional Supplements

Pytorch

Unlike most other libraries for this course, Pytorch is not included in the basic Anaconda installation. To use Pytorch, we suggest you choose one of two options.

  • Install Pytorch locally (for free). You can see the directions on the website: Select the stable build, your operating system, Conda (for Anaconda), Python, and CPU to see install directions for your particular setup. (CUDA is used to support hardware acceleration with NVIDIA graphics cards and is not necessary for this course).
  • Use Pytorch in a Jupyter notebook in the cloud (also for free). The easiest way to do this if you have a Google account is with a Google colab notebook; Pytorch will already be available to you in this cloud environment.

You can find the official Pytorch documentation here. Of particular note are the Pytorch tutorials, including Pytorch recipes which serve as small examples of common tasks.

Book

The deep learning book is available free online and is authored by some of the leading experts in machine learning with deep artificial neural networks. It is very detailed and in-depth and is purely for those who are interested in learning more about deep learning theory now or in the future; you do not need to read the book for this course.

Module 07: Statistical Inference

  1. Prepare (due Mon 3/3)
    1. Content below
    2. Canvas quizzes
  2. Class engagement – See on the class forum
  3. Homework (due Sun 3/16) [Link]
  4. Worked Example [Link]

Content (Slides in the Box folder)

Note: the slides for this module have been updated. Please switch to the “slides” panel when viewing the video in Panopto. DO NOT stay on the “screen” panel, as the recorded screen showed the old slides (which contained typoes and old information).

07.A – Confidence Intervals and Bootstrapping

  1. Intro Confidence Intervals (17 min.)
  2. Confidence Intervals in Python (17 min.)
  3. Misconceptions about Confidence Intervals (short read)
    OR
    The 3rd paragraph (starting with “As a technical note…” in this link

07.B – Hypothesis Testing

  1. Intro Hypothesis Testing and Proportions (14 min.)
  2. Hypothesis Testing Means and More (33 min.)

Optional Supplements

You can access an excellent free online textbook on OpenIntro Statistics here, co-authored by Duke faculty. You can pay a suggested but adjustable price for a tablet-friendly pdf, but you can also just get the regular pdf for free. For Module 7, the following optional readings may be particularly helpful supplements:

  • Chapter 5.2 Confidence intervals for a proportion. This provides introductory material on confidence intervals elaborating on 5.A.1.
  • Chapter 5.3 Hypothesis testing for a proportion. This elaborates on the introduction to hypothesis testing from 5.B.1.
  • Chapters 7.1, 7.3, and 7.5 cover material from 5.B.2 on using t-tests for a single mean, the difference of two means, and many pairwise means respectively.
  • Chapter 6.3 discusses the chi-square test for categorical data introduced in 5.B.2.

In addition, here is the documentation for the scipy.stats library that implements most of the functionality described here as well as many other useful statistical functions.

Exam 01 Logistics: In-Person Exam

This post outlines the in-person part of Exam 1. See the Practicum 1 or Practicum 1 Update posts for details on the other parts.

  • Modules covered: 2 – 5
  • When: Wednesday 2/26, during regular class time
  • Is in-person only
  • Bring a calculator.
  • It is a paper exam taken during class.
  • We will print and provide a reference sheet for you at the exam. See what it is in the exam Box folder.
  • You may bring one piece of standard-sized paper as a cheatsheet and can put things on the front and back.
  • There will be multiple versions.
  • Code on the exam
    • It will have no code writing and focus more on thinking like a data scientist.
    • It will have code reading (so know what these functions do), in particular:
      • The results of calling the describe function on a data set.
      • The results of a seaborn function call: catplot, displot, or relplot.
    • You will not be tested on regular expressions on the paper exam.
    • The data set used for this exam is Seaborn’s taxis data set. We recommend familiarizing yourself with the columns’ meanings.

Study Exams

  • Canvas Exam 1 Study Quiz
    • Worth 2 class engagement points
    • Includes randomized question pools for all questions that can be auto-graded of all past exams.
  • Study Exam in exam Box folder
    • You may see a question in here that is duplicated from the Canvas quiz, that’s because part of it is not auto-gradeable and we wanted to ensure you saw what the question will look like on the actual exam.
    • Solutions for the exam in Box will be released on the Friday before the exam. This is to encourage everyone to try the study exams before looking at the solutions.

Grading Scale and Points Allocation

For the questions that do not have a clear correct or incorrect answer or where partial credit is warranted, the following rubric will be used.

  • E (Exemplary) – Work that meets all requirements and displays full mastery of all learning goals and material.
  • S (Satisfactory) – Work that meets all requirements and displays at least partial mastery of all learning goals as well as full mastery of core learning goals.
  • N (Not yet) – Work that does not meet some requirements and/or displays developing or incomplete mastery of at least some learning goals and material.
  • U (Unassessable) – Work that is missing, does not demonstrate meaningful effort, or does not provide enough evidence to determine a level of mastery.

The number of points earned is distributed across the problems based on the number of learning goals they are testing. The rubric will be converted to points as follows:

  • E = full credit
  • S = E_full_credit – some small value resulting in around E_full_credit*0.9
  • N = E_full_credit * 0.6
  • U = E_full_credit * 0.2
  • Blank = 0

Exam 01 Logistics: Practicum

This post outlines the Practicum of Exam 1. See the in-person Exam 1 or Practicum 1 Update posts for details on the other parts.

  • Modules covered: 2 – 5
  • When: Friday 2/28 12:01am to Saturday 3/1 11:59pm
    • There is no class on Friday.
    • It should take around 2-3 hours to complete, but you can take as long as you want. It must be submitted before the deadline.
  • Study Practicum in exam Box folder
  • This can be done in a pair. See details below on the logistics, the definition of collaboration, and the consequences if collaboration happens without citation.
  • It is a take-home, open book, open note, open internet, and open LLM practicum.
    • Each question will have a variable you set to True or False to indicate if you used an LLM when answering this question.
  • It is closed to anyone outside you (and your partner if you have one). So, do not ask someone to do it for you or ask on places like stackoverflow.
  • It focuses on coding and interpreting the results of that code.
  • Consists of a Jupyter Notebook and a data set
    • Recommendation: Discuss in advance with your partner (if you have one) how you will create the final submission and who will submit it.
  • At the start of the practicum, a Canvas announcement will go out with a link to the Box folder containing all the files you need.
  • The act of submitting or being part of a submission means that you are upholding the Duke community standard that you contributed equally to this submission and only talked amongst yourselves when working on it.
  • Protect the integrity of the practicum and your submission.
    • Take your practicum:
      • In a secure location where only you (and your partner) can see your screen (and only your partner can talk to you).
      • In a place where you will not be distracted or tempted to talk to someone beyond your partner (if you have one).
    • You can do the following only after grades have been published for the Practicum Update. Doing any of these before grades are published will be considered a violation of the Duke Community Standard.
      • Discuss what you did on the practicum.
      • Show your solutions to other students.
      • View other solutions.
  • If you have a question during the practicum, ask it as a private new message on the class forum, in helper hours, or during class time when Prof. Stephens-Martinez will be in the helper hours Zoom room.
    • We cannot help you debug your code. If the notebook or autograder appears to be not working, but it turns out your code has a bug, you will be graded according to your submission.
    • We will do our best to always have someone checking the forum. However, we cannot promise that someone will instantly answer your question.
    • The practicum is tested for readability, so the wording should be straightforward.

Collaboration on the Practicum

  • Working in a pair means you collaborated on the Practicum.
    • Collaboration – 2 people have collaborated if one or both have given or received work/help on the Practicum. Notice these are “or’s.” That means if you share your Practicum with another person, even if that person did not give you anything in return, you both are now considered collaborators and should include each other in your notebook(s) as a partner.
    • This also means that if 2 people submit together and then 1 person shares that submission with a 3rd person, who then submits something too similar to have been done in isolation, all 3 are considered collaborators because it is impossible to detect who shared with whom. This collaboration is then considered a violation of the rules and, therefore, a violation of the Duke Community Standard.
  • The NetIds of all those who worked on the notebook must be listed in the notebook. There will be a 0-point test case with two variables for the NetIds of you and your partner. If you are solo, the notebook will state what to fill in for the other variable.
    • If you do not do this and we detect your notebooks as too similar to have been done in isolation, this is considered a violation of the Duke Community Standard.
  • You and your partner may submit notebooks separately or as a single submission. If you plan to submit identical files, submit as a single submission. Please help the graders be efficient.

Grading Scale and Points Allocation

This is the same as Exam 1’s in-person exam, with the following addition:

  1. For Exemplary – The code is clean and easy to read (see the study exam for examples of what this means).
  2. Unit tests in the autograder for the Practicum will earn you points up to, but not quite, the U level.
  3. How much fewer points an S is worth compared to an E depends on the practicum part. The practicum totals to 100 points. The goal is earning only S’s results in a low A. So, for example, if the Practicum has only 4 questions, an S would lose 2.5 points compared to an E, which means getting all S’s is a low A (90%), but still guarantees an A on the Practicum.

Exam 01 Logistics: Practicum Update

This post outlines the Practicum Update part of Exam 1. See the in-person Exam 1 or Practicum 1 posts for details on the other parts

  • Your group will have the option to update your Practicum after seeing the results of your Practicum grade. If you choose to submit an update, your grade for the Practicum will be as follows:
    • Practicum (original): 15%
    • Practicum Update: 85%
  • When: Thursday 3/6 – Saturday 3/8
    • This is during Module 7.
  • For the update, you will do the following:
    • Update your original notebook as needed.
    • Fill in the template diff cell at the top of the Practicum and list all of the changes you made from your original submission.
      • This is worth 0.5 points per question.
      • We may not grade your update properly if you do not do this.
  • We may grade outside of your changes because the Practicum aims to show your competency level in the material, not your competency + what the graders accidentally miss in the first grading.

Module 06: Combining Data

  1. Prepare (due Mon 2/17)
    1. Content below
    2. Canvas quizzes
  2. Class engagement – See on the class forum
  3. Homework (due Sun 2/23) [Link]
  4. Worked Example [Link]

Content (Slides in the Box Folder)

06.A – Summarizing Data

  1. Read Section 3.8 Aggregating and Grouping from Python Data Science Handbook.
  2. Read Section 3.9 Pivot Tables from Python Data Science Handbook.

06.B – Merging Data

  1. Read Section 3.6 Concat and Append from Python Data Science Handbook. Please note that the join_axes optional parameter mentioned in this section has been deprecated from the Pandas library, you can skip over the details on this parameter.
  2. Read Section 3.7 Merge and Join from Python Data Science Handbook
  3. Table Relationships (4 min.)
  4. Which Join to Use (4 min.)
  5. Record Linkage (8 min.)
  6. Fuzzy Matching (21 min.)

Optional Supplements