Monthly Archives: April 2022

Mini-Exam 4 Retake

Mini-Exam 4 Retake Logistics

  • Timeframe: It will open Monday 4/25, 12:01 AM, and close Wednesday 4/27, 11:59 PM.
    • The exam will close at 11:59 pm regardless of when you started.
    • This is so it is during the class’s final exam period.
  • It assesses the same thing as Mini-exam 4.
    • You may use things that you have learned that were not in the modules that this exam is testing but you can answer it without knowing any modules beyond what this exam is testing.
  • The data sets and events will be different.
  • You do not need to do both parts. You can only do one part if you wish. You must do ALL of the questions in that part though. We will take the max score per part.
  • All other information is similar to Mini-Exam 1’s. Such as getting the files, Gradescope, Sakai, asking for help, grading policy, etc.

Mini-Exam 4 and Mini-Exam 3 Retake

Mini-Exam 4 Logistics

  • Module covered: 9
    • Module 10 is not tested in the mini-exams
  • Timeframe: It will open Thursday 4/14, 12:01 AM, and close Saturday 4/16, 11:59 PM.
    • The exam will close at 11:59 pm regardless of when you started.
  • The exam will be take-home. It is open book, open note, open internet, but closed to people.
    • This means you cannot communicate with anyone about the exam, including asking someone through the Internet (like stackoverflow) for help and receiving help.
  • Like prior mini-exams, it consists of 2 parts that each have a time limit of 2 hours. Both parts will have data sets and they will be different.
  • Grading policy specifics
    • Your answer will not be judged on the accuracy of the model, only on how you contextualize it to a simple probabilistic baseline (e.g. if there are 4 categories and they are all equally likely, you compare it to an accuracy baseline of 25%. However, if one category is present 50% of the time, your baseline accuracy would be 50% because the baseline model could just always guess the most popular category).
    • Warning: To provide more fine-grained granularity of measuring mastery, questions are split into smaller parts. This means the number of points per question part is not uniform.
  • All other information is similar to Mini-Exam 1’s. Such as getting the files, Gradescope, Sakai, asking for help, grading policy, etc.

Mini-Exam 3 Retake Logistics

  • Timeframe: It will open Thursday 4/14, 12:01 AM, and close Saturday 4/16, 11:59 PM.
    • The exam will close at 11:59 pm regardless of when you started.
  • It assesses the same thing as Mini-exam 3.
    • You may use things that you have learned that were not in the modules that this exam is testing but you can answer it without knowing any modules beyond what this exam is testing.
  • The data sets and events will be different.
  • You do not need to do both parts. You can only do one part if you wish. You must do ALL of the questions in that part though. We will take the max score per part.
  • All other information is similar to Mini-Exam 1’s. Such as getting the files, Gradescope, Sakai, asking for help, grading policy, etc.

Project: Video Presentation

Due: Wednesday 4/20, Late due Friday 4/22 (no late penalty and no need for homework slip days)

If 75% of the class fills out course evaluations, the presentation’s late due date will become Saturday 4/23. Go to the course evaluations page to find out how to fill them out.

General Directions

The project video presentation is intended to provide a high-level overview of your project to an audience of your peers (that is, individuals who have a reasonable knowledge of data science but are not experts in your particular project topic). Presentation recordings will be made available to the entire class (through Sakai, so not available outside of the class). The presentation should demonstrate your ability to communicate the significance and interpret the findings of your research project. The presentation should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

Your group should create a video recording of your presentation in which every group member speaks and in which you use a visual aid such as presentation slides. The easiest way to do this is to simply hold a zoom call with all members of your project group, share your screen with your presentation slides, and record either locally or to the cloud (see Zoom recording help information). If this is not possible, you can also record portions individually and combine the recordings (though this will require additional editing work). In the end, we will ask for a URL to your complete recording, so you can either provide a share link to a zoom cloud recording or you can record locally and then upload your recording to Duke Box, Warpwire, or any other cloud platform where we can access and view your recording directly online (we should not need to download to view the recording). Ensure that anyone with the link can view your recording.

In terms of length, the presentation should be between 8 and 12 minutes. You can have as many slides as are necessary, but a typical pace has 1-2 slides per minute, so 8-24 slides total would be reasonable. Your slides should prioritize well-labeled figures or visualizations and use text sparingly to emphasize important points. When you are finished you will submit a pdf of your slides to Gradescope under the assignment “Project Video Presentation.” Be sure to include your names and NetIds in your final document and use the group submission feature on Gradescope.  Your first slide should include the URL where we can view the recording of your presentation.

Part 0: Title Slide

The very first slide of your presentation should be a title slide containing at least the following information:

  • A title of your project/presentation
  • Names of all group members
  • URL to the video recording of your presentation

Part 1: Introduction and Research Questions

Your presentation should begin by introducing your topic generally and posing your research questions. Provide some explanation of the relevance or motivation of your research questions.

Part 2: Data Sources

Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from/how they were prepared.

Part 3: Results

Describe your results. Where possible, provide well labeled and legible charts/figures in your slides to summarize results instead of verbose text. Interpret the results in the context of your research questions. It may not be possible to describe every individual result from your project in a brief amount of time. Focus on the most important and essential results for addressing your research questions.

Unlike your final report, it is not generally possible to describe your methods in sufficient detail in a short presentation so that an informed audience member could reproduce your results. Instead, you should focus on your results and their interpretation, and only discuss methods at a high level such as may be necessary to interpret the results.

Part 4: Limitations and Future Work

You should briefly discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed audience member (such as a peer in the class) could improve on and extend your results.

Grading Rubric

Final reports will be evaluated on the following criterion-based rubric. Reports satisfying all criteria will receive full credit.

  1. Submits a relevant document satisfying general requirements including a URL to a recording
  2. Includes a brief introduction to the topic of interest
  3. Poses one or more concrete research questions
  4. Provides a reasonable discussion of the relevance or motivation for the research questions
  5. Includes a discussion of concrete/specific data sources
  6. Provides results in the form of analysis, tables, visualization, etc.
  7. Tables and figures are properly labeled and legible
  8. Results are discussed and interpreted in the context of the research questions
  9. Provides a reasonable discussion of any limitations to the results
  10. Provides a reasonable discussion of future work and how the results could be extended
  11. The final recording is polished and easy to follow.

Project: Final Report

Due: Wednesday 4/20, Late due Friday 4/22 (no late penalty and no need for homework slip days)

If 85% of the class fills out course evaluations, the report’s late due date will become Saturday 4/23. Note this is a HIGHER number than for the presentation. Go to the course evaluations page to find out how to fill them out.

General Directions

The final report is intended to provide a comprehensive account of your collaborative course project in data science. The report should demonstrate your ability to apply the data science skills you have learned to a real-world project in a holistic way from posing research questions and gathering data to analysis, visualization, interpretation, and communication. The report should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

The report should contain at least the parts defined below. In terms of length, it should be 5-7 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 3-4 pages of text and 5-7 pages overall with tables and figures. It is important to stay within the page limit, practicing being succinct is an important skill. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Project Final Report” by the due date. Be sure to include your names and NetIds in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see part 3 below for more details).

In general, your approach to this report should be to write as if you had “planned this as your project all along.” A report is not a chronological story of your project. It is a summary of what you did where the “story” serves the reader’s comprehension.

Part 1: Introduction and Research Questions

Your final report should begin by reintroducing your topic and restating your research question(s) as in your proposal. As before, your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. In contrast to the prior reports, the final report does not need to explicitly justify that the research questions are substantial and feasible in the text; your results should demonstrate both of these points. Therefore, you can remove that text to save space.

You should still explicitly justify how your research questions are relevant. In other words, be sure to explain the motivation of your research questions.

You can start with the text from your prototype, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. Your introduction should be sufficient to provide context for the rest of your report.

Part 2: Data Sources

Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from / how they were prepared. You can begin with the text from your prototype but be sure to update it to fit the vision for your final project.

Part 3: Results and Methods

This is likely to be the longest section of your paper at multiple pages. The results and methods section of your report should explain your detailed results and the methods used to obtain them. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with written explanations of the significance of the results with respect to the research questions outlined previously.

Your description of your methods should be specific. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying “we cleaned the data and dealt with missing values” or “we built a predictive model” is not sufficient detail, for example.

Your report should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, GitLab repo (if you use GitHub wish to keep the repo private add Prof. Stephens-Martinez (username: ksteph) and your mentor to the repo), or whatever other platforms your group is using to house your data and code.

Part 4: Limitations and Future Work

In this part, you should discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed reader (such as a peer in the class) could improve on and extend your results.

Part 5: Conclusion

Provide a brief (one or two paragraphs) summary of your results. This summary of results should address your research questions. For example, if one of your research questions was “Did COVID-19 result in bankruptcy in North Carolina during 2020?” then a possible (and purely hypothetical) summary of results might be “We aggregate the public records disclosures of small businesses in North Carolina from January 2019 to December 2020 and find substantial evidence that COVID-19 did result in a moderate increase in bankruptcy during 2020. This increase is not geographically uniform and is concentrated during summer and fall 2020. We also examined the impact of federal stimulus but cannot provide an evaluation of its impact from the available data.”

(Optional) Part 6: Appendix of additional figures and tables

If you are struggling to keep your report within the 5-7 page limit, you may move some of your figures and tables to an optional appendix that will not count against your page limit. However, your report should stand on its own without the appendix. The appendix is for adding more nuance to your results, not to give you more space to talk about your results. Succinctness is an important skill to practice when doing data science.

Grading Rubric

Final reports will be evaluated on the following criterion-based rubric. Reports satisfying all criteria will receive full credit.

  1. Submits a relevant document satisfying general requirements – If you submit a report that is over the page limit (not counting the appendix), you will lose points.
  2. Includes a brief introduction to the topic of interest
  3. Poses one or more concrete research questions
  4. Provides a reasonable justification that research questions are relevant
  5. Includes a discussion of concrete/specific data sources
  6. Provides results in the form of analysis, tables, visualization, etc.
  7. Final tables and visualizations are properly labeled and legible
  8. Results provide reasonable answers to research questions and interpretation is provided in the text. Some results may be negative or incomplete (with discussion) but should provide some concrete evidence toward answers to research questions.
  9. Results and methods demonstrate substantial effort and progress over the course of the project
  10. Methods used to obtain results are described in sufficient detail to understand and interpret results
  11. Methods used are generally appropriate and do not contain significant methodological errors
  12. Provides a link/reference to additional materials (e.g., code and data stored in Box or GitLab)
  13. Provides a reasonable discussion of any limitations to the results
  14. Provides a reasonable discussion of future work and how the results could be extended
  15. Provides a conclusion
  16. Final writeup is edited and polished. Can have one or two typos or grammatical errors, but the document is sufficiently edited as to not distract or confuse the reader.