Author: Dr Kristin Stephens-Martinez, Ph.D.

(In-person) Exam Retakes

This post outlines what the in-person exam retakes will be like.

Tell us you are coming by filling out the retake exam form (help save paper)

  • When: Thursday 11/14, 2-5 pm
    • Exam 1 Retake: 2-2:55 pm
    • Exam 2 Retake: 3-3:55 pm
    • Exam 3 Retake: 4-4:55 pm
    • (We will likely have a standard deviation of 5 minutes around these times, but we will not start early)
  • Where: Physics 130 (usual lecture room)
  • There are three in-person exam retakes. One for each of the midterm exams.
  • Each retake will cover only the original modules covered by that exam. See the calendar or the original exam logistics posts for that information.
  • You are allowed one helper sheet per exam. You can only have one out per exam if you bring multiple to use on different exams.
  • Each exam will be 55 minutes long.
    • If you have an SDAO accommodation, you need to schedule a time with the testing center. Multiply 55 minutes with your extra time and the number of exams you plan to retake. We will handle merging the exam pdfs into a single “exam” retake.
  • The exams have been scaled down regarding the number of questions to accommodate this new time limit. The points were redistributed so that the assessed concepts are still worth about the same.
  • There will be no regrade window due to the necessary turnaround time for submitting grades. If you wish to discuss your grade, you must email Prof. Stephens-Martinez.

Specific information

  • Exam 1 Retake
  • Exam 2 Retake
    • Bring your calculator. You can have it out for Exam 1 and 3 Retake, but it is not required.
    • We will provide a reference sheet again.
    • See Exam 2 logistics for any other information.
  • Exam 3 Retake
    • Bring a calculator if you want to.
    • The front of the exam will have the precision, recall, and accuracy formulas for your reference.
    • See Exam 3 logistics for any other information.

Exam 03 Logistics

This post outlines what Exam 2 will be like.

Two different pieces are considered part of the exam. There’s the Practium and the in-person Exam.

Exam General Information

  • Modules covered:
    • 07 Statistical Inference
    • 08 Prediction & Supervised Machine Learning
  • Practice Exam


  • When: Wednesday, 11/15, starts 12:01am EST, ends 11:59pm EST
    • It should take your group around 3-4 hours to complete, but your group can take as long as you want. It must be submitted before the end of the day.
  • This will have questions from both modules.
  • Working together as a group are the same as Practicum 02’s logistics.
    • To be clear, collaboration means that help or sharing of answers happens in any direction. Student A and B could have shared answers with each other. Or student A shared answers with student B but did not receive student B’s answers, and vice versa.
  • All other details are the same as Practicum 02’s logistics.

Practicum Update

  • When: Thursday 11/30 – Saturday 12/02
    • Note: This is after Thanksgiving break and during Module 10
  • All other details are the same as Practicum 2 Update’s logistics.

In-person Exam

  • When: Friday 11/17, during regular class time
  • This will have questions from both modules.
  • Calculators are not required unless you want to use them to calculate things like precision and recall precisely when explaining your interpretation of confusion matrices. Such calculations will not be required.
  • The exam does not have a reference sheet. We will give you the formulas for precision, recall, and accuracy on the front page of the exam.
  • All other details are the same as In-person Exam 2’s logistics.

Grading Scale and Points Allocation

This is the same as Exam 1’s logistics.

Exam 02 Logistics

This post outlines what Exam 2 will be like.

Two different pieces are considered part of the exam. There’s the Practium and the in-person Exam.

Exam General Information

  • Modules covered:
    • 05 Probability
    • 06 Combining Data
  • Practice Exam – will be posted in Box if there is time


  • When: Wednesday, 10/25, starts 12:01am EST, ends 11:59pm EST
    • It should take your group around 2-3 hours to complete, but your group can take as long as you want. It must be submitted before the end of the day.
    • NOTE: This is longer than Practicum 1’s estimate. The Practicum 1 survey clearly showed my prediction equation based on TA speed as incorrect and I have adjusted it accordingly.
  • This will have questions from both Module 05 Probability and Module 06 Combining Data.
  • Working together as a group.
    • Group Requirement: A notebook has no more than the contribution or collaboration of a total of 3 people on it.
    • Your group can consist of 1-3 people, a.k.a you may work on the practicum solo or with 1 or 2 other people.
    • Your group’s members may submit notebooks separately, as a single submission, or broken up subsets (e.g., a 3-person group could submit 2 notebooks where one person is solo and the others submit as a pair).
    • All members of a group must be listed in the notebook. There will be a 0-point test case with three variables for the NetIds of all those in the group. If you have fewer than 3 members, the notebook will state what to fill in for the other variables.
    • Here is an example of what this means you may not do:
      • StudentA, StudentB, and StudentC work with each other.
      • StudentB and StudentC submit as a pair.
      • StudentA later works with StudentD.
      • StudentA and StudentD submit separately.
      • This is not allowed because that means StudentA and StudentD’s notebooks had a total of 4 people contributing to it due to StudentA collaborating with a total of 3 others. Moreover, StudentB and StudentC will also get in trouble if their code is too similar to StudentA or StudentD’s code because it is very unlikely they can provide strong evidence they didn’t work with StudentD.
  • All other details are the same as Practicum 1’s logistics.

Practicum Update

In-person Exam

  • When: Friday 10/27, during regular class time
  • It focuses on Module 05 Probability.
  • Bring a calculator.
  • We will print and provide a reference sheet for you at the exam. See what it is in the practice_exams Box folder.
    • You can still bring your own cheatsheet, just like last time.
  • All other details are the same as In-person Exam 1’s logistics.

Grading Scale and Points Allocation

This is the same as Exam 1’s logistics.

Exam 1 Logistics

This post outlines what Exam 1 will be like.

Two different pieces are considered part of the exam. There’s the Practium and the in-person Exam.

Exam General Information

  • Modules covered: 2, 3, 4 (module 1 is covered in that you will be writing in Python)
  • Practice Exam – will be posted if there is time


  • Done with 1 or 2 other students (so your team of 4 or 5 can split in half). You will submit through the group submission process on Gradescope
  • When: Wednesday, 10/4, starts 12:01am EST, ends 11:59pm EST
    • But it should only take your group around 75 minutes, but your group can take as long as you want. It must be submitted before the end of the day.
    • There is no class on this day, so your group has the time to work on this
  • It is a take-home, open book, open note, open internet, and open LLM exam.
  • Closed to any person outside of your group. So, no asking someone to do it for you or on places like stackoverflow.
  • It focuses on coding and interpreting the results of that code.
  • Consists of a Jupyter Notebook and a data set
    • Recommendation: Discuss in advance with your group how you will create the final submission and who will submit it.
  • A Canvas announcement will go out at the start of the exam with a link to the Box folder containing all the files you need.
  • The act of submitting and being part of a submission means that you are upholding the Duke community standard that you contributed equally to this submission and only talked amongst yourselves when working on it.
  • Protect the integrity of the exam and your exam submission.
    • Take your exam:
      • in a secure location where only your group can see your screen or talk to you.
      • in a place where you will not be distracted or tempted to talk to someone outside of your group.
    • Only after grades have been published for the Practicum Update can you do the following. Doing any of these before grades are published will be considered a violation of the Duke Community Standard.
      • Discuss what you did on the exam.
      • Show your solutions to students outside of your group.
      • View other solutions.
  • If you have a question during the exam, ask it as a private new message on the class forum. Or in helper hours.
    • We cannot help you debug your code. If it appears as if the notebook or autograder is not working, but it turns out to be your own code that has a bug, you will be graded according to your submission.
    • We will do our best to always have someone checking the forum. However, we cannot make promises someone will instantly answer your question.
    • The exam is tested for readability, so the wording should be straightforward.

Practicum Update

  • Your group will have the option to update your Practicum after seeing the results of your Practicum grade. If you choose to submit an update, your grade for the Practicum will be as follows:
    • Practicum (original): 15%
    • Practicum Update: 85%
  • When: Thursday 10/12 – Saturday 10/14
    • Note this is partially during Fall break
    • This is during Module 6 (which has due dates as usual despite Fall break since we plan to release Module 6 early enough that you can get it done before Fall break, plus there are late tokens)
  • It is your responsibility to work with your group to do the update.
  • For the update, you will do the following:
    • Update your original notebook as needed.
    • Add a new cell at the bottom of type MarkDown and list all of the changes you made from your original submission.
    • We may grade outside of your changes, but giving us a list will tell us what to focus on.

In-person Exam

  • When: Friday 10/6, during regular class time
  • Is in-person only
  • It is a paper exam taken during class.
  • There will be multiple versions.
  • It focuses on thinking like a data scientist and will have no coding writing.
    • It may have the results of calling the describe function on a data set, so know what that function does.
  • You may bring one piece of paper as a cheatsheet and can put things on the front and back.

Grading Scale and Points Allocation

For the questions that do not have a clear correct or incorrect answer or where partial credit is warranted, the following rubric will be used.

  • E (Exemplary) – Work that meets all requirements and displays full mastery of all learning goals and material. And the code is clean and easy to read (see the practice exam for examples of what this means).
  • S (Satisfactory) – Work that meets all requirements and displays at least partial mastery of all learning goals as well as full mastery of core learning goals.
  • N (Not yet) – Work that does not meet some requirements and/or displays developing or incomplete mastery of at least some learning goals and material.
  • U (Unassessable) – Work that is missing, does not demonstrate meaningful effort, or does not provide enough evidence to determine a level of mastery.

The number of points earned is distributed across the problems based on the number of learning goals they are testing. The rubric will be converted to points as follows:

  • E = full credit
  • S = E_full_credit – 1 or 2
  • N = E_full_credit * 0.6
  • U = E_full_credit * 0.2
  • Blank = 0

Unit tests in the autograder for the Practicum will earn you points up to, but not quite, the U level. Whether an S is 1 or 2 less points than an E depends on the exam part. Each part is out of 100 points. The goal is earning only S’s results in a low A. So, for example, if the Practicum has only 4 questions, an S would lose 2 points compared to an E, which means getting all S’s is a low A (92%), but still guarantees an A on the Practicum.

Project: Group Formation

Due: Friday, September 8th

In place of a final exam, this course has a collaborative final project where we ask you to bring your data science skills to bear on a research project of your own choosing. To help groups without enough people (you need 4-5), you must indicate who will be in your group by filling out the group formation survey on Gradescope no later than Friday, September 8th. Use the group submission feature on Gradescope to include all of your group members on a single submission.

The survey should only take a couple of minutes. If you do not have anyone to work with or do not have sufficient people, we will assign you to a group or add more people to your group. So fill out the form so we know what your or your subgroup’s interests are.

If it is helpful to start thinking about possible project ideas, below are some project ideas. You can also brainstorm now using strategies that are outlined in the Initial Plan post (TBA). But it is not required that you have a concrete project idea until the proposal.

Project ideas

Not sure how to get started? Looking for examples of what a data science project might look like? Here are some of the topics that students studied in Spring 2020:

  • Comparing Stock Market Losses between SARS and SARS-CoV-2
  • Recessions, Depressions, and Depression: Mental Health in Relation to Economic Factors
  • Predicting North Carolina Election Outcomes
  • Relating Text Analysis of Corporate Reports and Stock Performance
  • Modeling Consumer Flight Behavior Based on Economic Indicators
  • Predicting COVID-19 Death Tolls from Google Search Trends
  • Sentiment Analysis of COVID-19 Tweets
  • Economic Status and Drug Overdose in North Carolina
  • Analyzing Gender and Tech Careers
  • Political Landscape According to Social Media
  • Forecasting Market Shocks and Performance using Article Headlines
  • Tracking Recidivism in US Prisons
  • Understanding AirBnBs impact on Evictions
  • Understanding Musical Tastes (Music Recommender System)
  • Human Impact on Climate since the Industrial Revolution
  • The Troll Toll: An Investigation into Troll Tweets

And here is an archive of summer Data+ projects from the last several years. In Data+, teams of about 4 undergraduate students collaborate over the summer on a data science project. You should be able to see final presentations and/or executive summary slides for most projects; feel free to browse for inspiration.

Example Data Sources

Below, we have some examples of datasets or where you might find data. You should work with data that is interesting to you and should feel free (strongly encouraged even) to look for sources yourself. These are listed just as possibilities and starting places.

  • Kaggle maintains several thousand public datasets of interest in a variety of topics. Kaggle also hosts several prediction challenges; one idea for a machine learning project is to enter one of these competitions as a team.
  • The Yelp Dataset is provided by Yelp as a research challenge with lots and lots of data about reviews, businesses, images, and cities – text data, rich json data, etc.
  • The University of California Irvine maintains a large UCI ML repository of publicly contributed datasets aimed toward machine learning tasks of all types. They range from small simple example datasets to large and complicated datasets from specific scientific domains.
  • has a huge compilation of data sets produced by the US government. The US Census Bureau also publishes datasets from all of its survey work. Similarly, The Supreme Court Database tracks all cases decided by the US Supreme Court, and provides links to all kinds of information about the US Congress and all votes casted by its members.
  • Duke University Library Digital Repository Research Data
  • ICPSR – An international consortium of more than 750 academic institutions and research organizations, Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for the social science research community. ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.

Restart and Run All

Here is a guide on how to submit properly formatted .ipynb files for homework and exams.

This is important because a common way to detect bugs that the autograder might find is to first restart the kernel and run everything. Moreover, it is the equivalent of ensuring that you are submitting a polished notebook.

Steps to restart the kernel and run all

Go to the button labeled “Kernel” at the top of the page.

Click on the “Kernel” button to open this dropdown menu. Now click “Restart & Run All.”

This box will then appear. Click the red button labeled “Restart and Run All Cells.”

How to confirm it is correctly formatted

After following the steps above, your notebook cells’ “In [#] ” labels will be in numerical order. Make sure to confirm that all code cells are run. This is a properly formatted .ipynb file. 


Example of an incorrectly formatted notebook

Although this .ipynb file below is in numerical order, the first cell in the file does not start with “In [1]” and is deemed an improperly formatted .ipynb file. 

Powered by WordPress & Theme by Anders Norén