Due: Tuesday 11/9
For the final project, you will get to choose your own group of 3-4 people. You do not need to find a group if you do not know who you want to work with or you do not yet have enough members in your group. We will assign or form groups if needed. The form asks for project interest and we will match you to a group that aligns as close to your interests as possible.
When you are ready, you will need everyone’s names and netids. Fill out the group formation form. You may fill out this form with fewer people, which we will take as an indication of you want to be together and we will add more members to your group. If you have 3 people in your group, we may add a 4th to ensure everyone is in a group by the end of the group formation period.
Only fill this form out once per group. If your group fills this out more than once, we will take the last entry. Make sure to confirm with your group who is responsible for filling out the form.
This form collects the person filling it out already, so the questions are only for the other members of the group.
Project Ideas
It may help to start looking for project ideas and data sources as you form your groups.
Example Ideas
Not sure how to get started? Looking for examples of what a data science project might look like? Here are some of the topics that students studied in CS216 Spring 2020:
- Comparing Stock Market Losses between SARS and SARS-CoV-2
- Recessions, Depressions, and Depression: Mental Health in Relation to Economic Factors
- Predicting North Carolina Election Outcomes
- Relating Text Analysis of Corporate Reports and Stock Performance
- Modeling Consumer Flight Behavior Based on Economic Indicators
- Predicting COVID-19 Death Tolls from Google Search Trends
- Sentiment Analysis of COVID-19 Tweets
- Economic Status and Drug Overdose in North Carolina
- Analyzing Gender and Tech Careers
- Political Landscape According to Social Media
- Forecasting Market Shocks and Performance using Article Headlines
- Tracking Recidivism in US Prisons
- Understanding AirBnBs impact on Evictions
- Understanding Musical Tastes (Music Recommender System)
- Human Impact on Climate since the Industrial Revolution
- The Troll Toll: An Investigation into Troll Tweets
And here is an archive of summer Data+ projects from the last several years. In Data+, teams of about 4 undergraduate students collaborate over the summer on a data science project. You should be able to see final presentations and/or executive summary slides for most projects; feel free to browse for inspiration.
Example Data Sources
Below, we have some examples of datasets or where you might find data. You should work with data that is interesting to you and should feel free (strongly encouraged even) to look for sources yourself. These are listed just as possibilities and starting places.
- Kaggle maintains several thousand public datasets of interest in a variety of topics. Kaggle also hosts several prediction challenges; one idea for a machine learning project is to enter one of these competitions as a team.
- The Yelp Dataset is provided by Yelp as a research challenge with lots and lots of data about reviews, businesses, images, and cities – text data, rich json data, etc.
- The University of California Irvine maintains a large UCI ML repository of publicly contributed datasets aimed toward machine learning tasks of all types. They range from small simple example datasets to large and complicated datasets from specific scientific domains.
- Data.gov has a huge compilation of data sets produced by the US government. The US Census Bureau also publishes datasets from all of its survey work. Similarly, The Supreme Court Database tracks all cases decided by the US Supreme Court, and GovTrack.us provides links to all kinds of information about the US Congress and all votes casted by its members.