Author: Dr Kristin Stephens-Martinez, Ph.D.

(In-person) Exam Retakes

This post outlines what the in-person exam retakes will be like.

Tell us you are coming by “registering” for the retake exam via this form (help save paper)

  • When: Tuesday 4/30, 9-12 pm
    • Exam 1 Retake: 9-10:15 pm
    • Exam 2 Retake: 10:45-12:00 pm
    • (We will likely have a standard deviation of 5 minutes around these times, but we will not start early)
  • Where: LSRC B101 (usual lecture room)
  • There are two in-person exam retakes. One for each of the midterm exams.
  • Each retake will cover only the original modules covered by that exam. For that information, see the calendar or the original exam logistics posts.
  • You are allowed one helper sheet per exam. You can only have one out per exam if you bring different ones for each exam.
  • Each exam will be 75 minutes long.
    • If you have an SDAO accommodation, you need to schedule a time with the testing center. Multiply 75 minutes with your extra time and the number of exams you plan to retake. We will handle merging the exam pdfs into a single “exam” retake.
  • There will be no regrade window due to the necessary turnaround time for submitting grades. If you wish to discuss your grade, you must email Prof. Stephens-Martinez.

Specific information

  • Exam 1 Retake
  • Exam 2 Retake
    • The data set used for this exam is Seaborn’s titanic data set
    • If this is the only exam you are retaking, DO NOT come into the room until after the Exam 1 Retake is done (So aim to arrive after 10:30). Please do not disturb your peers.
    • See Exam 2 logistics for any other information

Optional Module: Git and Jupyter Notebooks

This module is 100% optional. It is intended as supplementary material if you plan to use git with your Jupyter Notebooks.

Content

A. Git Mental Model

B. Git with notebooks, how?

Recommended Reading

Exam 02 Logistics: In-Person Exam

This post outlines the in-person part of Exam 2. See the Practicum 2 or Practicum 2 Update posts for details on the other parts. Study exams are in the practice_exams Box folder.

  • When: Wednesday 04/10, during regular class time
  • Modules: 06 – 08
    • In addition, the data set used for some questions is Seaborn’s healthexp data. Make sure you are familiar with what the columns mean.
  • A calculator is strongly recommended. But if you forget, you will be okay because no exact calculations are required.
  • The formulas you will need are printed on the exam’s front page. See an example on the study exam in the practice_exams Box folder.
  • Once again, there is no coding writing and focus more on thinking like a data scientist.
    • You will be expected to be able to read the results of calling the describe, groupby, and pivot_table functions.
  • All other details are the same as In-person Exam 1’s logistics, which are:
    • There are multiple versions
    • You may bring one piece of paper as a cheatsheet and can put things on the front and back.

Grading Scale and Points Allocation

This is the same as Exam 1’s logistics in that problems are graded on an ESNU scale. How many points each rubric level is worth depends on the question. A question’s worth depends on how many questions in the exam are testing the same concept (more questions for that concept means fewer points for each question). The goal is that an exam with only S’s on every question results in a 90% because it indicates a Satisfactory level of understanding of all the concepts the exam is testing, rather than Exemplary.

Exam 02 Logistics: Practicum Update

This post outlines the Practicum 2 Update part of Exam 2. See the Practicum 2 or in-person Exam 2 posts for details on the other parts. Study exams are in the practice_exams Box folder.

  • When: Thursday 4/18 – Saturday 4/20
  • If you choose to submit to the Practicum Update, the weighting is the same as Practicum 1. It will be applied regardless of whether it will help or harm your grade:
    • Practicum (original): 15%
    • Practicum Update: 85%
  • It is your responsibility to work with your group to do the update.
  • For the update, you will do the following:
    • Update your original notebook as needed.
    • Add a “Practicum Update Diff” cell at the bottom worth 0.5 points per question. See details below.
    • We may grade outside of your changes.

Practicum Update Diff Cell

Do the following to create your diff cell and explain what you changed.

  1. Go to the bottom and add a new cell.
  2. Update the cell’s type to MarkDown.
  3. Fill in the following template:
# Practicum Update Diff Cell
1. <1-2 sentences of how you changed question 1, or "No Changes">
2. <1-2 sentences of how you changed question 2, or "No Changes">
etc.

There is no need to provide a detailed report of what you changed. You must provide a bullet per question regardless of if you changed it (or you will not earn the 0.5 points for it). Here is an example of a diff cell that could have been provided for Practicum 1:

# Practicum Update Diff Cell
1. We did not change the code and only changed the writeup to better answer questions 2 and 3.
2. No changes
3. We cleaned up the code by removing the ==True and did not change the writeup.
4. We overhauled everything, code and writeup.

Exam 1 Logistics

This post outlines what Exam 1 will be like.

Two different pieces are considered part of the exam. There’s the in-person Exam and the Practium.

Exam General Information

  • Modules covered: 2, 3, 4, 5 (module 1 is covered in that you will be using Python in the Practicum)
  • Study Exams are in the exam Box folder
    • Solutions will be released on the Friday before the exam. This is to encourage everyone to try the study exams before looking at the solutions.

In-person Exam

  • When: Wednesday 2/28, during regular class time
  • Is in-person only
  • It is a paper exam taken during class.
  • There will be multiple versions.
  • It will have no code writing and focus more on thinking like a data scientist.
    • It may have the results of calling the describe function on a data set, so know what that function does.
  • Bring a calculator.
  • We will print and provide a reference sheet for you at the exam. See what it is in the exam Box folder.
  • You may bring one piece of paper as a cheatsheet and can put things on the front and back.

Practicum

  • When: Friday, 3/1, starts 12:01am EST, ends 11:59pm EST
    • It should take your group around 2-3 hours to complete, but your group can take as long as you want. It must be submitted before the end of the day.
    • There is no class on this day, so your group has the time to work on this.
  • This can be done in a group. See details below.
  • It is a take-home, open book, open note, open internet, and open LLM exam.
    • Each question will have a variable you set to True or False to indicate if you used an LLM to help you on this question.
  • It focuses on coding and interpreting the results of that code.
  • Consists of a Jupyter Notebook and a data set
    • Recommendation: Discuss in advance with your group how you will create the final submission and who will submit it.
  • A Canvas announcement will go out at the start of the exam with a link to the Box folder containing all the files you need.
  • It is closed to any person outside of your group. So, do not ask someone to do it for you or ask on places like stackoverflow.
  • The act of submitting and being part of a submission means that you are upholding the Duke community standard that you contributed equally to this submission and only talked amongst yourselves when working on it.
  • Protect the integrity of the exam and your exam submission.
    • Take your exam:
      • in a secure location where only your group can see your screen or talk to you.
      • in a place where you will not be distracted or tempted to talk to someone outside of your group.
    • Only after grades have been published for the Practicum Update can you do the following. Doing any of these before grades are published will be considered a violation of the Duke Community Standard.
      • Discuss what you did on the exam.
      • Show your solutions to students outside of your group.
      • View other solutions.
  • If you have a question during the exam, ask it as a private new message on the class forum. Or in helper hours.
    • We cannot help you debug your code. If it appears as if the notebook or autograder is not working, but it turns out to be your own code that has a bug, you will be graded according to your submission.
    • We will do our best to always have someone checking the forum. However, we cannot make promises someone will instantly answer your question.
    • The exam is tested for readability, so the wording should be straightforward.

Group requirements

  • You will submit through the group submission process on Gradescope.
  • Your group’s members may submit notebooks separately, as a single submission, or broken up member subsets (e.g., a 3-person group could submit 2 notebooks where one person is solo and the others submit as a pair).
  • All members of a group must be listed in the notebook. There will be a 0-point test case with three variables for the NetIds of all those in the group. If you have fewer than 3 members, the notebook will state what to fill in for the other variables.
    • If you do not do this and we detect your notebooks as too similar to have been done in isolation, this is considered a violation of the Duke Community Standard.
  • Group Requirement: A notebook has no more than 3 people who contributed or collaborated on it.
    • 2 people have collaborated if one or both have given or received work/help on the Practicum. Notice these are “or’s.” That means if you share your Practicum with another person, even if that person did not give you anything in return you both are now considered collaborators and should include each other in your submission as a collaborator/group member.
    • This also means that if 3 people submit together and then 1 person shares that submission with a 4th person who then submits something too similar to have been done in isolation, all 4 are considered collaborators because it is impossible to detect who shared with whom. This collaboration is then considered a violation of the rules and, therefore, a violation of the Duke Community Standard.

Practicum Update

  • Your group will have the option to update your Practicum after seeing the results of your Practicum grade. If you choose to submit an update, your grade for the Practicum will be as follows:
    • Practicum (original): 15%
    • Practicum Update: 85%
  • When: Thursday 3/21 – Saturday 3/23
    • Note it is very after the original Practicum for the sake of avoiding Spring Break.
    • This is during Module 8.
  • It is your responsibility to work with your group to do the update.
  • For the update, you will do the following:
    • Update your original notebook as needed.
    • Add a new cell at the bottom of the type MarkDown and list all of the changes you made from your original submission. Not doing this means we may not grade your update properly.
    • We may grade outside of your changes.

Grading Scale and Points Allocation

For the questions that do not have a clear correct or incorrect answer or where partial credit is warranted, the following rubric will be used.

  • E (Exemplary) – Work that meets all requirements and displays full mastery of all learning goals and material. And the code is clean and easy to read (see the study exam for examples of what this means).
  • S (Satisfactory) – Work that meets all requirements and displays at least partial mastery of all learning goals as well as full mastery of core learning goals.
  • N (Not yet) – Work that does not meet some requirements and/or displays developing or incomplete mastery of at least some learning goals and material.
  • U (Unassessable) – Work that is missing, does not demonstrate meaningful effort, or does not provide enough evidence to determine a level of mastery.

The number of points earned is distributed across the problems based on the number of learning goals they are testing. The rubric will be converted to points as follows:

  • E = full credit
  • S = E_full_credit – 1 or 2
  • N = E_full_credit * 0.6
  • U = E_full_credit * 0.2
  • Blank = 0

Unit tests in the autograder for the Practicum will earn you points up to, but not quite, the U level. Whether an S is 1 or 2 less points than an E depends on the exam part. Each exam part is out of 100 points. The goal is earning only S’s results in a low A. So, for example, if the Practicum has only 4 questions, an S would lose 2 points compared to an E, which means getting all S’s is a low A (92%), but still guarantees an A on the Practicum.

Project: Group Formation

Due: Friday, January 26th

In place of a final exam, this course has a collaborative final project where we ask you to bring your data science skills to bear on a research project of your own choosing. It is time to start forming groups (of 4-5 students) for the project. Fill out the group formation quiz on Gradescope no later than Friday, January 26th.

The form should only take a couple of minutes. If you already know who you want to work with, you can indicate that in the form using the group submission feature in Gradescope. In this case, communicate with your group first and have one member fill out the form once with everyone added as group members. If you submit more than once, the active submission is considered valid. It’s also fine if you don’t know who you want to work with, in which case you should fill out the form solo, and we will match you to a group.

If it is helpful to start thinking about possible project ideas, below are some project ideas. You can also brainstorm now using strategies that are outlined in the Initial Plan post (out soon). But it is not required that you have a concrete project idea until the proposal.

Project ideas

Not sure how to get started? Looking for examples of what a data science project might look like? Here are some of the topics that students studied in Spring 2020:

  • Comparing Stock Market Losses between SARS and SARS-CoV-2
  • Recessions, Depressions, and Depression: Mental Health in Relation to Economic Factors
  • Predicting North Carolina Election Outcomes
  • Relating Text Analysis of Corporate Reports and Stock Performance
  • Modeling Consumer Flight Behavior Based on Economic Indicators
  • Predicting COVID-19 Death Tolls from Google Search Trends
  • Sentiment Analysis of COVID-19 Tweets
  • Economic Status and Drug Overdose in North Carolina
  • Analyzing Gender and Tech Careers
  • Political Landscape According to Social Media
  • Forecasting Market Shocks and Performance using Article Headlines
  • Tracking Recidivism in US Prisons
  • Understanding AirBnBs impact on Evictions
  • Understanding Musical Tastes (Music Recommender System)
  • Human Impact on Climate since the Industrial Revolution
  • The Troll Toll: An Investigation into Troll Tweets

And here is an archive of summer Data+ projects from the last several years. In Data+, teams of about 4 undergraduate students collaborate over the summer on a data science project. You should be able to see final presentations and/or executive summary slides for most projects; feel free to browse for inspiration.

Example Data Sources

Below, we have some examples of datasets or where you might find data. You should work with data that is interesting to you and should feel free (strongly encouraged even) to look for sources yourself. These are listed just as possibilities and starting places.

  • Data.gov has a huge compilation of data sets produced by the US government. The US Census Bureau also publishes datasets from all of its survey work. Similarly, The Supreme Court Database tracks all cases decided by the US Supreme Court, and GovTrack.us provides links to all kinds of information about the US Congress and all votes casted by its members.
  • Duke University Library Digital Repository Research Data
  • ICPSR – An international consortium of more than 750 academic institutions and research organizations, Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for the social science research community. ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
  • The University of California Irvine maintains a large UCI ML repository of publicly contributed datasets aimed toward machine learning tasks of all types. They range from small simple example datasets to large and complicated datasets from specific scientific domains.
  • Kaggle maintains several thousand public datasets of interest in a variety of topics. Kaggle also hosts several prediction challenges; one idea for a machine learning project is to enter one of these competitions as a team.
  • The Yelp Dataset is provided by Yelp as a research challenge with lots and lots of data about reviews, businesses, images, and cities – text data, rich json data, etc.

 

Exam 02 Logistics: Practicum

This post outlines the Practicum 2 part of Exam 2. See the Practicum 2 Update or in-person Exam 2 posts for details on the other parts. Study exams are in the practice_exams Box folder.

  • When: Friday 4/12 12:01am to Saturday 4/13 11:59pm
    • There is no class on Friday.
    • It should take your group around 3-5 hours to complete, but your group can take as long as you want. It must be submitted before the end of the day.
    • NOTE: You have two days for this Practicum and the expected hours to complete is longer than Practicum 1’s estimate. The Practicum 1 survey clearly showed my prediction equation based on TA speed as incorrect and I have adjusted it accordingly.
  • Modules: 06 – 08
  • All other details are the same as Practicum 1’s logistics, copy+pasted below for convenience.

Copy from Practicum 1’s logistics

  • This can be done in a group. See details below.
  • It is a take-home, open book, open note, open internet, and open LLM exam.
    • Each question will have a variable you set to True or False to indicate if you used an LLM to help you on this question.
  • It focuses on coding and interpreting the results of that code.
  • Consists of a Jupyter Notebook and a data set
    • Recommendation: Discuss in advance with your group how you will create the final submission and who will submit it.
  • A Canvas announcement will go out at the start of the exam with a link to the Box folder containing all the files you need.
  • It is closed to any person outside of your group. So, do not ask someone to do it for you or ask on places like stackoverflow.
  • The act of submitting and being part of a submission means that you are upholding the Duke community standard that you contributed equally to this submission and only talked amongst yourselves when working on it.
  • Protect the integrity of the exam and your exam submission.
    • Take your exam:
      • in a secure location where only your group can see your screen or talk to you.
      • in a place where you will not be distracted or tempted to talk to someone outside of your group.
    • Only after grades have been published for the Practicum Update can you do the following. Doing any of these before grades are published will be considered a violation of the Duke Community Standard.
      • Discuss what you did on the exam.
      • Show your solutions to students outside of your group.
      • View other solutions.
  • If you have a question during the exam, ask it as a private new message on the class forum. Or in helper hours.
    • We cannot help you debug your code. If it appears as if the notebook or autograder is not working, but it turns out to be your own code that has a bug, you will be graded according to your submission.
    • We will do our best to always have someone checking the forum. However, we cannot make promises that someone will answer your question instantly.
    • The exam is tested for readability, so the wording should be straightforward.

Group requirements

  • You will submit through the group submission process on Gradescope.
  • Your group’s members may submit notebooks separately, as a single submission, or broken up member subsets (e.g., a 3-person group could submit 2 notebooks where one person is solo and the others submit as a pair).
  • All members of a group must be listed in the notebook. There will be a 0-point test case with three variables for the NetIds of all those in the group. If you have fewer than 3 members, the notebook will state what to fill in for the other variables.
    • If you do not do this and we detect your notebooks as too similar to have been done in isolation, this is considered a violation of the Duke Community Standard.
  • Group Requirement: A notebook has no more than 3 people who contributed or collaborated on it.
    • 2 people have collaborated if one or both have given or received work/help on the Practicum. Notice these are “or’s.” That means if you share your Practicum with another person, even if that person did not give you anything in return you both are now considered collaborators and should include each other in your submission as a collaborator/group member.
    • This also means that if 3 people submit together and then 1 person shares that submission with a 4th person who then submits something too similar to have been done in isolation, all 4 are considered collaborators because it is impossible to detect who shared with whom. This collaboration is then considered a violation of the rules and, therefore, a violation of the Duke Community Standard.

Grading Scale and Points Allocation

This is the same as Exam 1’s logistics.

Restart and Run All

Here is a guide on how to submit properly formatted .ipynb files for homework and exams.

This is important because a common way to detect bugs that the autograder might find is to first restart the kernel and run everything. Moreover, it is the equivalent of ensuring that you are submitting a polished notebook.

Steps to restart the kernel and run all

Go to the button labeled “Kernel” at the top of the page.

Click on the “Kernel” button to open this dropdown menu. Now click “Restart & Run All.”

This box will then appear. Click the red button labeled “Restart and Run All Cells.”

How to confirm it is correctly formatted

After following the steps above, your notebook cells’ “In [#] ” labels will be in numerical order. Make sure to confirm that all code cells are run. This is a properly formatted .ipynb file. 

 

Example of an incorrectly formatted notebook

Although this .ipynb file below is in numerical order, the first cell in the file does not start with “In [1]” and is deemed an improperly formatted .ipynb file.