Category Archives: Project

Final Project Presentation

Due: Sunday 12/12, 11:59 PM

General Directions

The project presentation is intended to provide a high-level overview of your project to an audience of your peers (that is, individuals who have a reasonable knowledge of data science but are not experts in your particular project topic). Presentation recordings will be made available to the entire class (through Sakai, so not available outside of the class). The presentation should demonstrate your ability to communicate the significance and interpret the findings of your research project. The presentation should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

Your group should create a video recording of your presentation in which every group member speaks and in which you use a visual aid such as presentation slides. The easiest way to do this is to simply hold a zoom call with all members of your project group, share your screen with your presentation slides, and record either locally or to the cloud (see Zoom recording help information). If this is not possible, you can also record portions individually and combine the recordings (though this will require additional editing work). In the end, we will ask for a URL to your complete recording, so you can either provide a share link to a zoom cloud recording or you can record locally and then upload your recording to Duke Box, Warpwire, or any other cloud platform where we can access and view your recording directly online (we should not need to download to view the recording). Ensure that anyone with the link can view your recording.

In terms of length, the presentation should be between 8 and 12 minutes. You can have as many slides as are necessary, but a typical pace has 1-2 slides per minute, so 8-24 slides total would be reasonable. Your slides should prioritize well labeled figures or visualizations and use text sparingly to emphasize important points. When you are finished you will submit a pdf of your slides to gradescope under the assignment “Project Presentation.” Be sure to include your names and netids in your final document and use the group submission feature on gradescope.  Your first slide should include the URL where we can view the recording of your presentation.

Part 0: Title Slide

The very first slide of your presentation should be a title slide containing at least the following information:

  • A title of your project / presentation
  • Names of all group members
  • URL to recording of your presentation

Part 1: Introduction and Research Questions

Your presentation should begin by introducing your topic generally and posing your research questions. Provide some explanation of the relevance or motivation of your research questions.

Part 2: Data Sources

Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from / how they were prepared.

Part 3: Results

Describe your results. Where possible, provide well labeled and legible charts/figures in your slides to summarize results instead of verbose text. Interpret the results in the context of your research questions. It may not be possible to describe every individual result from your project in a brief amount of time. Focus on the most important and essential results for addressing your research questions.

Unlike your final report, it is not generally possible to describe your methods in sufficient detail in a short presentation that an informed audience member could reproduce your results. Instead, you should focus on your results and their interpretation, and only discuss methods at a high level such as may be necessary to interpret the results.

Part 4: Limitations and Future Work

You should briefly discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed audience member (such as a peer in the class) could improve on and extend your results.

Grading Rubric

Final reports will be evaluated on the following criterion-based rubric. Reports satisfying all criteria will receive full credit.

  1. Submits a relevant document satisfying general requirements including a URL to a recording
  2. Includes a brief introduction to the topic of interest
  3. Poses one or more concrete research questions
  4. Provides a reasonable discussion of the relevance or motivation for the research questions
  5. Includes a discussion of concrete/specific data sources
  6. Provides results in the form of analysis, tables, visualization, etc.
  7. Tables and figures are properly labeled and legible
  8. Results are discussed and interpreted in the context of the research questions
  9. Provides a reasonable discussion of any limitations to the results
  10. Provides a reasonable discussion of future work and how the results could be extended
  11. The final recording is polished and easy to follow.

Final Project Report

Due: Sunday 12/12, 11:59 PM

General Directions

The final report is intended to provide a comprehensive account of your final course project. The report should demonstrate your ability to apply the data science skills you have learned to a real-world project in a holistic way from posing research questions and gathering data to analysis, visualization, interpretation, and communication. The report should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

The report should contain at least the parts defined below. In terms of length, it should be about 5-7 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 3-4 pages of text and 5-7 pages overall with tables and figures. You should convert your written report to a pdf and upload it to gradescope under the assignment “Project Final Report” by the due date. Be sure to include your names and netids in your final document and use the group submission feature on gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to gradescope; instead, your report should contain instructions on how to access these resources (see part 4 below for more details).

Part 1: Introduction and Research Questions

Your final report should begin by reintroducing your topic and restating your research question(s) as in your proposal. As before, your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. In contrast to the prior reports the final report does not need to explicitly justify that the research questions are substantial and feasible in text; your results should demonstrate both of these points. You should still explicitly justify how your research questions are relevant. In other words, be sure to explain the motivation of your research questions.

You can start with the text from your prototype, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. Your introduction should be sufficient to provide context for the rest of your report.

Part 2: Summary of Results

Provide a brief (one or two paragraphs) summary of your results. This summary of results should address your research questions. For example, if one of your research questions was “Did COVID-19 result in bankruptcy in North Carolina during 2020?” then a possible (and purely hypothetical) summary of results might be “We aggregate the public records disclosures of small businesses in North Carolina from January 2019 to December 2020 and find substantial evidence that COVID-19 did result in a moderate increase in bankruptcy during 2020. This increase is not geographically uniform and is concentrated during summer and fall 2020. We also examined the impact of federal stimulus but cannot provide an evaluation of its impact from the available data.”

Part 3: Data Sources

Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from / how they were prepared. You can begin with the text from your prototype but be sure to update it to fit the vision for your final project.

Part 4: Results and Methods

This is likely to be the longest section of your paper at multiple pages. The results and methods section of your report should explain your detailed results and the methods used to obtain them. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with written explanations of the significance of the results with respect to the research questions outlined previously.

Your description of your methods should be specific. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying “we cleaned the data and dealt with missing values” or “we built a predictive model” is not sufficient detail, for example.

Your report should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder or whatever other platforms your group is using to house your data and code. Make sure the permissions are set correctly.

Part 5: Limitations and Future Work

In this part, you should discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed reader (such as a peer in the class) could improve on and extend your results.

Grading Rubric

Final reports will be evaluated on the following criterion-based rubric. Reports satisfying all criteria will receive full credit.

  1. Submits a relevant document satisfying general requirements
  2. Includes a brief introduction to the topic of interest
  3. Poses one or more concrete research questions
  4. Provides a reasonable justification that research questions are relevant
  5. Provides a brief summary of results
  6. Includes a discussion of concrete/specific data sources
  7. Provides results in the form of analysis, tables, visualization, etc.
  8. Final tables and visualizations are properly labeled and legible
  9. Results provide reasonable answers to research questions and interpretation is provided in the text. Some results may be negative or incomplete (with discussion) but should provide some concrete evidence toward answers to research questions.
  10. Results and methods demonstrate substantial effort and progress over the course of the project
  11. Methods used to obtain results are described in sufficient detail to understand and interpret results
  12. Methods used are generally appropriate and do not contain significant methodological errors
  13. Provides a link/reference to additional materials (e.g., code and data stored in Box or Co-Lab)
  14. Provides a reasonable discussion of any limitations to the results
  15. Provides a reasonable discussion of future work and how the results could be extended
  16. Final writeup is edited and polished. Can have one or two typos or grammatical errors, but the document is sufficiently edited as to not distract or confuse the reader.

Final Project: Prototype

Due: Monday 11/22, with zoom meeting on 11/23

General Directions

The prototype deliverable to intended to demonstrate a proof of concept for your final project report. Large multi-week projects are challenging, this deliverable is intended to provide additional structure to ensure you are making progress and on a path towards success or to help you discover if your group needs to change plans. It consists of a written report detailed below along with any accompanying data, code, or other supplementary resources that demonstrate your progress so far in the project. You can think of it as a rough draft for your final project. The report should stand on its own so that it makes sense to someone who has not read your proposal.

The report should contain at least three parts, which we define below. In terms of length, it should be about 3-4 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 2-3 pages of text and 3-4 pages overall with tables and figures. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Final Project Prototype” by the due date. Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see part 2 below for more details).

Part 1: Introduction and Research Questions

Your prototype report should begin by reintroducing your topic and restating your research question(s) as in your proposal. Your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. Briefly justify each of these points as in the project proposal. You can start with the text from your proposal, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. Specifically, point out what has changed since the proposal. Your introduction should be sufficient to provide context for the rest of your report.

Part 2: Data Sources

After your introduction and research questions, your prototype should discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using and where they were collected from / how they were prepared. Briefly justify why your data are appropriate and sufficient to address your research questions. As in the introduction, you can begin with the text from your proposal but be sure to update it to fit with your evolving project.

Part 3: Preliminary Results and Methods

The preliminary results section of your report should summarize the results obtained so far in the project. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with a written explanation of the significance of the results with respect to the research questions outlined in the previous section. Your results do not need to be final or conclusive for your entire project but should demonstrate substantial effort and progress and should provide concrete proof of concept or initial analysis with respect to your research questions.

Your results should be specific about exactly what data were used and how the results were generated. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying “we cleaned the data and dealt with missing values” is not sufficient detail, for example.

Your report itself should include an explanation of your methods, but it should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, Google co-lab, or whatever other platform your group is using to house your data and code.

Part 4: Reflection and Next Steps

In this part, you should begin by reflecting on the progress of your project so far. Address the following:

  1. What has been successful in the project so far or what is essentially complete and ready for the final report?
  2. What has been challenging in the project so far or what is incomplete in the prototype that needs to be finished for the final report?
  3. What are your next steps? These should be concrete and specific actions that your group will take to address the challenges identified in order to complete a successful final project.

Feedback

For feedback on your project, your team will meet with Prof. Stephens-Martinez over zoom on Tuesday 11/23. We will use her Monday Zoom office hours link. To sign up for a timeslot, pick a time in the Doodle poll, using your team’s name/number.

Final Project: Proposal

Due: Sunday 11/14

This should be a document 1-2 pages in length that includes the following parts:

  1. What question(s) you plan to address
  2. The data set you will use
  3. A group plan

You will submit your proposal on Gradescope using the group submission feature, just like prior group submissions.

Part 1: Introduction and Research Questions

Your proposal should begin by introducing your topic in general and then defining one or more research questions. Research questions are the guiding questions you want to answer or problems you want to solve in your project. Your research question(s) should be (1) substantial, (2) feasible, and (3) relevant.

  1. Substantial research questions require more than a surface-level analysis (more than just computing basic summary statistics on readily available datasets, for example).
  2. Feasible research questions can actually be addressed by four or five team members over the course of approximately six weeks using data you can access.
  3. Relevant research questions address a subject of importance and interest within the scientific community or broader society.

You should provide a brief justification of your research question(s) with respect to each of these three points.

Part 2: Data Sources

Your project should deal with real data. We provide pointers to some data sources in the Project Ideas section of the group formation post, but you are welcome and encouraged to look for your own data sources. After your introduction and research questions, your proposal should discuss the data you will use to answer your research questions. Be as specific as possible: name the datasets you will use and how you will access them or specify where you will look for the relevant datasets and why you expect to be successful in finding them. You should also briefly justify why the data you plan to obtain will be relevant and appropriate for addressing your research questions. Searching for data sources as you refine your research questions is likely to be the most time-consuming part of preparing your proposal and is crucial for a good start on your project, so do not put it off.

Part 3: Group Plan

This should be similar to group 2’s plan and answers items 3 through 7. You of course can also have a team name (item 2).

Feedback and Grading Rubric

Proposals will be evaluated on the following criterion-based rubric. Proposals satisfying all criteria will receive full credit. Formative feedback (comments and suggestions) will also be provided for each proposal.

  1. Satisfies general directions (length, on-time pdf submission, group submission, etc.)
  2. Includes a brief introduction to the topic of interest
  3. Poses one or more concrete research questions
  4. Provides a reasonable justification that research questions are substantial
  5. Provides a reasonable justification that research questions are feasible
  6. Provides a reasonable justification that research questions are relevant
  7. Includes one or more specific datasets or reasonable discussion of how to locate data
  8. Provides reasonable justification that data sources are appropriate for research questions
  9. Has a group plan that addresses (1) how you will communicate, (2) when, (3) where, and (4) how you will work together, and (5) a proposal of what happens if a team member cannot finish their planned work.

Final Project: Group formation

Due: Tuesday 11/9

For the final project, you will get to choose your own group of 3-4 people. You do not need to find a group if you do not know who you want to work with or you do not yet have enough members in your group. We will assign or form groups if needed. The form asks for project interest and we will match you to a group that aligns as close to your interests as possible.

When you are ready, you will need everyone’s names and netids. Fill out the group formation form. You may fill out this form with fewer people, which we will take as an indication of you want to be together and we will add more members to your group. If you have 3 people in your group, we may add a 4th to ensure everyone is in a group by the end of the group formation period.

Only fill this form out once per group. If your group fills this out more than once, we will take the last entry. Make sure to confirm with your group who is responsible for filling out the form.

This form collects the person filling it out already, so the questions are only for the other members of the group.

Project Ideas

It may help to start looking for project ideas and data sources as you form your groups.

Example Ideas

Not sure how to get started? Looking for examples of what a data science project might look like? Here are some of the topics that students studied in CS216 Spring 2020:

  • Comparing Stock Market Losses between SARS and SARS-CoV-2
  • Recessions, Depressions, and Depression: Mental Health in Relation to Economic Factors
  • Predicting North Carolina Election Outcomes
  • Relating Text Analysis of Corporate Reports and Stock Performance
  • Modeling Consumer Flight Behavior Based on Economic Indicators
  • Predicting COVID-19 Death Tolls from Google Search Trends
  • Sentiment Analysis of COVID-19 Tweets
  • Economic Status and Drug Overdose in North Carolina
  • Analyzing Gender and Tech Careers
  • Political Landscape According to Social Media
  • Forecasting Market Shocks and Performance using Article Headlines
  • Tracking Recidivism in US Prisons
  • Understanding AirBnBs impact on Evictions
  • Understanding Musical Tastes (Music Recommender System)
  • Human Impact on Climate since the Industrial Revolution
  • The Troll Toll: An Investigation into Troll Tweets

And here is an archive of summer Data+ projects from the last several years. In Data+, teams of about 4 undergraduate students collaborate over the summer on a data science project. You should be able to see final presentations and/or executive summary slides for most projects; feel free to browse for inspiration.

Example Data Sources

Below, we have some examples of datasets or where you might find data. You should work with data that is interesting to you and should feel free (strongly encouraged even) to look for sources yourself. These are listed just as possibilities and starting places.

  • Kaggle maintains several thousand public datasets of interest in a variety of topics. Kaggle also hosts several prediction challenges; one idea for a machine learning project is to enter one of these competitions as a team.
  • The Yelp Dataset is provided by Yelp as a research challenge with lots and lots of data about reviews, businesses, images, and cities – text data, rich json data, etc.
  • The University of California Irvine maintains a large UCI ML repository of publicly contributed datasets aimed toward machine learning tasks of all types. They range from small simple example datasets to large and complicated datasets from specific scientific domains.
  • Data.gov has a huge compilation of data sets produced by the US government. The US Census Bureau also publishes datasets from all of its survey work. Similarly, The Supreme Court Database tracks all cases decided by the US Supreme Court, and GovTrack.us provides links to all kinds of information about the US Congress and all votes casted by its members.

Project 2

The zip file will be in the class Box folder in the Project folder. You will submit this as a group on Gradescope. This covers up to module 6. It is due Friday 10/29, late to Sunday 10/31.

To work collaboratively, you can choose to use Google Colab. Put the file in your Google Drive and share it with your group. When you open the file, it will open in Google Colab. You all should be able to work on the notebook at the same time. However, working within the same cell may not work. You may notice that the file locations for the data are over the internet, rather than local. This change is to make working with Colab easier, which does not hold onto the data files between uses.

Project 1

The zip file will be in the class Box folder in the Project folder. You will submit this as a group, similar to the group contract. This covers modules 1 through 3.

To work collaboratively, you can choose to use Google Colab. Put the file in your Google Drive and share it with your group. When you open the file, it will open in Google Colab. You all should be able to work on the notebook at the same time. However, working within the same cell may not work. You may notice that the file locations for the data are over the internet, rather than local. This change is to make working with Colab easier, which does not hold onto the data files between uses.

Group Contract

To set expectations in the beginning your group will fill out a group contract.

  1. Go to the group contract template. Read and discuss as a group what you all will do.
  2. One of your group mates should click on this link to make a Google doc copy of the contract for your own group.
  3. Make sure all group mates can edit the Google doc and fill it out.
  4. Download a pdf of the group contract and submit it as a group to Gradescope.
    1. If you do not know how to add group members to an assignment, Gradescope has a help page for this.

Due: 9/10 11:59 pm