Category Archives: Project

Project: Video Submission

Due: Friday, 12/6  – NOTE: there will be no regrade window for this milestone.

If your group would like written feedback, please fill out this form.

General Directions

The project video presentation is intended to provide a high-level overview of your project to an audience of your peers (that is, individuals who have a reasonable knowledge of data science but are not experts in your particular project topic). The presentation should demonstrate your ability to communicate the significance and interpret the findings of your research project. The presentation should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

Your group should create a video recording of your presentation in which every group member speaks and in which you use a visual aid such as presentation slides. The easiest way to do this is to simply hold a zoom call with all members of your project group, share your screen with your presentation slides, and record either locally or to the cloud (see Zoom recording help information). If this is not possible, you can also record portions individually and combine the recordings (though this will require additional editing work). In the end, we will ask for a URL to your complete recording, so you can either provide a share link to a zoom cloud recording or you can record locally and then upload your recording to Duke Box, Warpwire, or any other cloud platform that we can access such that we can view your recording directly online (we should not need to download to view the recording). Ensure that anyone with the link can view the recording.

In terms of length, the presentation should be between 8 and 12 minutes. You can have as many slides as are necessary, but a typical pace has 1-2 slides per minute, so 8-24 slides total would be reasonable. Your slides should prioritize well-labeled figures or visualizations and use text sparingly to emphasize important points. The text should also be large enough that it is reasonably easy to read. When you are finished, you will submit a pdf of your slides to Gradescope under the assignment “Project Video Presentation.” Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope.  Your first slide should include the URL where we can view the recording of your presentation.

Grading

  • E (Exemplary, 10pts) – Video presentation is between 8 and 12 minutes.
  • S (Satisfactory, 9 pts) – Video presentation is over 12 minutes.
  • N (Not yet, 6pts) – Video presentation does not reach 8 minutes.
  • U (Unassessable, 2pts) –  Video presentation is missing or does not demonstrate meaningful effort.

Part 0: Title Slide

The very first slide of your presentation should be a title slide containing at least the below information. It does not need to be in the actual video recording.

  • A descriptive title of your project/presentation, not “CS216 Presentation Video”
  • Names of all group members
  • URL to the video recording of your presentation

Grading

  • E (Exemplary, 10pts) – Work that meets all requirements.
  • S (Satisfactory, 9pts) – The title is not descriptive but meets all other requirements.
  • N (Not yet, 6pts) – Does not meet all requirements. URL for video recording is missing.

Part 1: Introduction and Research Questions

Your presentation should begin by introducing your topic generally and posing your research questions. Provide some explanation of the relevance or motivation of your research questions. These slides should serve to provide context surrounding the questions and potential broader effects of the problem.

Grading

  • E (Exemplary, 20 pts) – General introduction to topic and clearly defined research questions and their motivations.
  • S (Satisfactory, 19 pts) – General introduction to topic and clearly defined research questions. Discussion of motivations may be missing.
  • N (Not yet, 12 pts) – General introduction to topic. Research questions and motivations are not clearly defined.
  • U (Unassessable, 4 pts) – Introduction and research questions are missing or do not demonstrate meaningful effort.

Part 2: Data Sources

Discuss the data you collected and used to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from/how they were prepared (i.e., cleaning).

Grading

  • E (Exemplary, 20 pts) – Origins of data are properly specified, cited, and include discussion of what information they contain. Any relevant data wrangling, cleaning, or other data preparation is explained.
  • S (Satisfactory, 19 pts) – Origins of data are properly specified, cited, and include discussion of what information they contain. Any relevant data wrangling, cleaning, or other data preparation may be missing or could be improved.
  • N (Not yet, 12 pts) – Poorly specified data sources and lack of discussion of preparing the dataset.
  • U (Unassessable, 4 pts) – Discussion of data sources and data preparation are missing or do not demonstrate meaningful effort.

Part 3: Results

Describe your results. Where possible, provide well-labeled and legible charts/figures in your slides to summarize results instead of verbose text. Interpret the results in the context of your research questions. It may not be possible to describe every individual result from your project in a brief amount of time. Focus on the most important and essential results for addressing your research questions. Please note that a screenshot of your dataset does not count as a table or figure and should not be included in your video presentation.

Unlike your final report, it is not generally possible to describe your methods in sufficient detail in a short presentation so that an informed audience member could reproduce your results. Instead, you should focus on your results and their interpretation, and only discuss methods at a high level such as may be necessary to interpret the results.

Example of Interpreting results

Do not: “When we conducted our hypothesis test, we found that p < 0.05, so our results are significant.”

Do: “Since our p-value is significant, we could determine that generation 1 pokemon have a different popularity than all other pokemon. And since the mean popularity of generation 1 is higher than the mean of all the other pokemon, we can conclude that generation 1 is on average more popular.” [The slide shows the p-value]

Grading

  • E (Exemplary, 20 pts) – Most important and essential results are thoroughly discussed using labeled tables or figures followed by an interpretation of the results in the context of the research questions. 
  • S (Satisfactory, 19 pts) – Results are thoroughly discussed using labeled tables or figures followed by an interpretation of the results in the context of the research questions. Maybe missing an important result that should have been included.
  • N (Not yet, 12 pts) – Results are discussed using tables with missing labels or lacking interpretation in the context of the research questions.
  • U (Unassessable, 4 pts) – Results are missing or do not demonstrate meaningful effort. 

Part 4: Limitations and Future Work

You should briefly discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations. You may want to consider any ethical implications or acknowledge potential biases in the results. 

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed audience member (such as a peer in the class) could improve on and extend your results.

Grading

  • E (Exemplary, 20 pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Brief discussion of future work and how results could be extended and improved upon.
  • S (Satisfactory, 19 pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
  • N (Not yet, 12 pts) – Incomplete discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
  • U (Unassessable, 4 pts) – Limitations and future work are missing or do not demonstrate meaningful effort.

Checklist Before You Submit:

  1. Is your video presentation between 8 and 12 minutes in length?
  2. Does your first slide satisfy all requirements?
    1. A title of your project/presentation
    2. Names of all group members
    3. URL to the video recording of your presentation
  3. Do you have an Introduction and clearly stated Research Question(s)?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  4. Have you properly specified/cited one or more specific Data Sources and justified why they are relevant to the research Questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  5. Have you reported all of your important Results, including an interpretation of them in the context of the research questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  6. Have you defined clear Limitations to your results and Future Work?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?

Project: Final Report

Due: Friday, 12/6 – NOTE: there will be no regrade window for this milestone.

If your group would like written feedback, please fill out this form.

General Directions

The final report is intended to provide a comprehensive account of your collaborative course project in data science. The report should demonstrate your ability to apply the data science skills you have learned to a real-world project holistically, from posing research questions and gathering data to analysis, visualization, interpretation, and communication. The report should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

The report should contain at least the parts defined below. In terms of length, it should be 5-7 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 3-4 pages of text and 5-7 pages overall with tables and figures. It is important to stay within the page limit, as practicing being succinct is an important skill. Your final report should also have a descriptive title, not “CS216 Project Report”. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Project Final Report” by the due date, and assign the appropriate pages to questions in the grading rubric. Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see the Results and Methods section below for more details).

In general, your approach to this report should be to write as if you had “planned this as your project all along.” A report is not a chronological story of your project. It is a standalone summary of what you did where the “story” serves the reader’s comprehension. You provide enough information that the results are understandable, and your reader is convinced that the results are reasonable.

Grading

  • E (Exemplary, 20 pts) – Work that meets all requirements in terms of formatting and sections; report includes a descriptive title.
  • S (Satisfactory, 19 pts) – Work that meets all requirements but is over 7 pages.
  • N (Not yet, 12pts) – Does not meet all requirements.
  • U (Unassessable, 4pts) –  Missing at least one section.

Part 1: Introduction and Research Questions (15 points)

Your final report should begin by introducing your topic and restating your research question(s) as in your proposal. As before, your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. In contrast to the prior reports, the final report does not need to explicitly justify that the research questions are substantial and feasible in the text; your results should demonstrate both of these points. Therefore, you should remove that text to save space.

You should still explicitly justify how your research questions are relevant, although this should not be bolded as in previous milestones. In other words, be sure to explain the motivation of your research questions. Remember that relevant research questions address a subject of importance and interest within the scientific community or broader society. Additionally, we are looking for why your group believes this research project is worthwhile to your time in this course.

You can start with the text from your prototype, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. You should not state specific updates, rather, write the report as the final product and the prior milestones do not exist. Pretend the readers are unaware of the prior milestones. If you feel like an explanation of changes since the prototype is warranted, place that in the appendix. Your introduction should be sufficient to provide context for the rest of your report, and the content should be properly cited if it is drawn from external sources.

Grading

  • E (Exemplary, 15pts) – Comprehensive introduction with clearly labeled, up-to-date research questions and a justification for how the research questions are relevant. Report  introduction can stand alone without references to prior versions of the project; no text for explicit justifications for “substantial” and “feasible” are made for the research question(s).  
  • S (Satisfactory, 14pts) – Comprehensive introduction with clearly labeled, updated research questions and a justification for how the research questions are relevant. The introduction and research questions may not have been refined from the prototype (they have still kept reasoning for why their research questions are substantial and feasible).
  • N (Not yet, 9pts) – Incomplete introduction where the research questions or justification are missing pieces, but at least some of it is present. Or the justification is clearly not reasonable.
  • U (Unassessable, 3pts) – Incomplete introduction where it is entirely missing the research questions or justification or does not demonstrate meaningful effort.

Part 2: Data Sources (15 points)

Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from / how they were prepared. You can begin with the text from your prototype, but be sure to update it to fit the vision for your final project. All data sources should be properly cited in this report.

Grading

  • E (Exemplary, 15pts) – Origins of all data are properly specified, cited, and relevant to answering the research question(s). If any significant data wrangling, cleaning, or other data preparation was done, these processes are explained.
  • S (Satisfactory, 14pts) – Origins of all data are properly specified and cited. However, the justification is not clear why the data is relevant to the proposed research question(s). If any significant data wrangling, cleaning, or other data preparation was done, these processes are explained.
  • N (Not yet, 9pts) – Poorly specified data sources or the justification for using that data set or the methods to acquire the data is lacking. No discussion of preparing the dataset.
  • U (Unassessable, 3pts) – Data sources or methods to acquire data are missing or do not demonstrate meaningful effort.

Part 3: What Modules Are You Using? (15 points)

Your project should utilize concepts from modules we have covered in this course to answer your research question(s). We will assume you will use modules 1 (Python), 2 (Numpy/Pandas), and 5 (Probability).  Your final report should state at least 3 more modules that you have utilized for your project. Each module should have a short description of how you used the knowledge in this module and a justification for that use. In addition, include what specific concepts from the module you used and at what stage of your project you mostly used this module. Potential stages include, but are not limited to: data gathering, data cleaning, data investigation, data analysis, and final report.

  • Module 3: Visualization
  • Module 4: Data Wrangling
  • Module 6: Combining Data
  • Module 7: Statistical Inference
  • Module 8: Prediction & Supervised Machine Learning
  • Module 9: Databases and SQL
  • Module 10: Deep Learning

Your overall report should clearly show that you used the modules discussed in this section. You should add any additional modules used and update the existing modules to be more specific to the different tasks and stages of your projects that changed since your prototype.

Grading

  • E (Exemplary, 15pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) specific concepts they used, (4) what stage they used it, and (5) clearly implemented it in the final report. 
  • S (Satisfactory, 14pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used and (4) what stage they used it. Less than 3 modules are clearly implemented in the final report. 
  • N (Not yet, 9pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used and (4) what stage they used it. Only one module is clearly implemented in the final report.
  • U (Unassessable, 3pts) – Does not meet the Not Yet criteria.

Part 4: Results and Methods (15 points)

This is likely to be the longest section of your paper at multiple pages. The results and methods section of your report should explain your detailed results and the methods used to obtain them. Where possible, results should be summarized using clearly labeled tables or figures (i.e. Fig 1: XYZ) and supplemented with written explanations of the significance of the results with respect to the research questions outlined previously. Please note that a screenshot of your dataset or a screenshot of a dataframe does not count as a table or figure and should not be included in your final report.

Your description of your methods should be specific. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying, “we cleaned the data and dealt with missing values” or “we built a predictive model” is insufficient detail.

Your report should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, GitLab repo, or whatever other platforms your group is using to house your data and code.

Grading

  • E (Exemplary, 15pts) – Results are thoroughly discussed using clearly labeled tables or figures followed by written descriptions. Specific explanation of how the results were generated and from what data. Link to code/data to create charts or visualizations is provided. 
  • S (Satisfactory, 14pts) – Results are thoroughly discussed using clearly labeled tables or figures followed by written descriptions. Explanation of how the results were generated may lack some specification or it is somewhat unclear as to what data the results are from. Link provided.
  • N (Not yet, 9pts) – Results are discussed using tables with missing labels or lacking written descriptions. It is unclear how the results were generated and from what data.
  • U (Unassessable, 3pts) – Results are missing or do not demonstrate meaningful effort.

Part 5: Limitations and Future Work (10 points)

In this part, you should discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations. You may want to consider any ethical implications or potential biases of your results as well. 

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed reader (such as a peer in the class) could improve on and extend your results.

Grading

  • E (Exemplary, 10pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Brief discussion of future work and how results could be extended and improved upon.
  • S (Satisfactory, 9pts) – Discussion of important limitations and caveats to results could be improved or the discussion of future work and how results could be extended and improved upon lacks some specification.
  • N (Not yet, 6pts) –  Incomplete discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
  • U (Unassessable, 2pts) – Limitations and future work are missing or do not demonstrate meaningful effort.

Part 6: Conclusion (5 points)

Provide a brief (one or two paragraphs) summary of your results. This summary of results should address all of your research questions.

Example

If one of your research questions was “Did COVID-19 result in bankruptcy in North Carolina during 2020?” then a possible (and purely hypothetical) summary of results might be:

We aggregate the public records disclosures of small businesses in North Carolina from January 2019 to December 2020 and find substantial evidence that COVID-19 did result in a moderate increase in bankruptcy during 2020. This increase is not geographically uniform and is concentrated during summer and fall 2020. We also examined the impact of federal stimulus but cannot provide an evaluation of its impact from the available data.

Grading

  • E (Exemplary, 10pts) – Research questions are clearly and completely addressed through a summary of results. 
  • S (Satisfactory, 9pts) – Research questions are clearly addressed through a summary of results. The results may be lacking in completely answering the research questions.
  • N (Not yet, 6pts) –  Research questions are somewhat addressed through a summary of results. The results are lacking in completely answering the research questions. Or the results of one of the research questions is missing.
  • U (Unassessable, 2pts) – Conclusion is missing or does not demonstrate meaningful effort.

(Optional) Part 7: Appendix of additional figures, tables, and updates summary.

If you are struggling to keep your report within the 5-7 page limit, you may move some (not all) of your figures and tables to an optional appendix that will not count against your page limit. However, your report should stand on its own without the appendix. The appendix is for adding more nuance to your results, not to give you more space to talk about your results. Succinctness is an important skill to practice when doing data science. Your grader is not expected to look at the appendix when grading.

If you strongly feel like a summary of project updates since the proposal is required, you may put them in this appendix as well and mention they are in the appendix in the introduction.

Checklist Before You Submit:

  1. Does your final report satisfy all general directions?
    1. 5-7 pages in length
    2. Standard margins (1 in.)
    3. Font size is 11-12 pt
    4. Line spacing is 1-1.5
    5. Final document is a pdf
    6. Descriptive project title
  2. Do you have an Introduction and clearly stated Research Question(s)?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  3. Have you properly specified/cited one or more specific Data Sources and justified why they are relevant to the research Questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  4. Did you state at least 3 Modules to be used and how, as well as a justification of which concepts will be used at specific stages of the project?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  5. Have you reported all of your Results and Methods, including a specific explanation of how the results were generated?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  6. Have you defined clear Limitations to your results and Future Work?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  7. Have you written a comprehensive Conclusion?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?

Project: Prototype

Project Prototype

Due: Saturday, November 9th

General Directions

The prototype deliverable is intended to demonstrate a proof of concept for your final project report. Large multi-week projects are challenging — this deliverable is intended to provide additional structure to ensure you are making substantial progress and are on a path toward success, as well as to get any help your team may need.

It consists of a written report detailed below, along with any accompanying data, code, or other supplementary resources that demonstrate your progress so far in the project. You can think of it as a rough draft for your final project. The report should stand on its own so that it makes sense to someone who has not read your proposal.

The report should contain at least five parts, which we define below. In terms of length, it should be 3-4 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 2-3 pages of text and 3-4 pages overall with tables and figures. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Project Prototype” by the due date. Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see parts 2 and 4 below for more details).

  • E (Exemplary, 30pts) – Work that meets all formatting requirements.
  • S (Satisfactory, 29 pts) – Work that meets all requirements but is over 4 pages OR is missing the NetIDs.
  • N (Not yet, 18pts) – Does not meet all requirements.
  • U (Unassessable, 6pts) –  Missing at least one section.

Part 1: Introduction and Research Questions (15 points)

Your prototype report should begin by reintroducing your topic and restating your research question(s) as in your proposal. Your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. Briefly justify each of these points as in the project proposal. You can start with the text from your proposal, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. Specifically, point out what has changed since the proposal or if there are no changes. Your introduction should be sufficient to provide context for the rest of your report. Additionally, like the proposal, your report should include a descriptive project title (i.e., “Investigating the effect of politics and race on covid-19” is a good title, “216 Final Project Prototype” is not).

Grading

  • E (Exemplary, 15pts) – Comprehensive introduction with clearly labeled, updated research questions and a justification for the research questions about whether they are substantial, feasible, and relevant. Any changes are specifically mentioned or they note there are no changes. Includes a descriptive and relevant project title. 
  • S (Satisfactory, 14pts) – Comprehensive introduction with clearly labeled research questions and a justification for the research questions about whether they are substantial, feasible, and relevant. Changes and updates may not be specifically mentioned. May be missing a descriptive project title.
  • N (Not yet, 9pts) – Incomplete introduction where the research questions or justification are missing pieces, but at least some of it is present. Or the justification is clearly not reasonable.
  • U (Unassessable, 3pts) – Incomplete introduction where it is entirely missing the research questions or justification or does not demonstrate meaningful effort.

Part 2: Data Sources (15 points)

After your introduction and research questions, your prototype should discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using and where they were collected from / how they were prepared. Briefly justify why your data are appropriate and sufficient to address your research questions. As in the introduction, you can begin with the text from your proposal, but be sure to update it to fit your evolving project.

Grading

  • E (Exemplary, 15pts) – Origins of data are properly specified, cited, and relevant to answering the research question(s). If any data wrangling, cleaning, or other data preparation was done, these processes are explained.
  • S (Satisfactory, 14pts) – Origins of data are properly specified and cited. However, the justification is not clear why the data is relevant to the proposed research question(s). If any data wrangling, cleaning, or other data preparation was done, these processes are explained.
  • N (Not yet, 9pts) – Poorly specified data sources or the justification for using that data set or the methods to acquire the data is lacking. No discussion of preparing the dataset.
  • U (Unassessable, 3pts) – Data sources or methods to acquire data are missing or do not demonstrate meaningful effort.

Part 3: What Modules Are You Using? (15 points)

Your project should utilize concepts from modules we have/will cover in this course to answer your research question(s). We will assume you will use modules 1 (Python), 2 (Numpy/Pandas), and 5 (Probability). This section should state at least 3 more modules that you will utilize for your project. Each module should have a short description of how you will use the knowledge in this module and a justification for that use. In addition, include what concepts from the module you will use and at what stage of your project you plan to mostly use this module. Potential stages include, but are not limited to: data gathering, data cleaning, data investigation, data analysis, and final report.

  • Module 3: Visualization
  • Module 4: Data Wrangling
  • Module 6: Combining Data
  • Module 7: Statistical Inference
  • Module 8: Prediction & Supervised Machine Learning
  • Module 9: Databases and SQL
  • Module 10: Deep Learning

As in Parts 1 and 2, you can begin with the text from your proposal, but be sure to update it to fit with your evolving project. You should add any additional modules you will be using and update the existing modules to be more specific to the different tasks and stages of your projects.

Grading

  • E (Exemplary, 15pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they will use the module, (2) justification for using this module, (3) what concepts they will likely use, and (4) what stage they expect they will use it. 
  • S (Satisfactory, 14pts) – States at least 3 modules, but there are some weaknesses somewhere, such as one module as 3 or more parts not well fleshed out or across all 3 modules one part is weak.
  • N (Not yet, 9pts) – States at 3 modules, but 3 or more parts are entirely missing or basically non-existent out of 12 = 4 parts X 3 modules.
  • U (Unassessable, 3pts) – Does not meet the Not Yet criteria, such as having fewer than 3 modules or missing more than 3 parts across all 12 = 4 parts X 3 modules.

Example:

Here is an example of the proposal versus the prototype justification for Module 5 (Probability), note the differences. Assume the project is about creating a prediction model that is classifying the data. Remember that this module is not on the list of modules to count as one of your 3, but you are welcome to include analysis using concepts from it. Note the bolding, which will help ensure you meet all requirements and your grader to find them.

Proposal

Module 5 Probability: We will use this module to calculate the accuracy of a baseline version of the model we will build. We will do this by considering the proportion of the label we are trying to predict, as well as taking into account some of the independent variables. Our justification is that we need a baseline accuracy to understand how good our model is. The concepts we will mainly use are the probability axioms and maybe some of Bayes or marginalization to calculate this baseline. We plan to use this module during the data analysis and final report stage.

Prototype

Module 5 Probability: We used this module to calculate the accuracy of a baseline version of a model we will build to predict the type of a Pokemon. We did this by considering the proportion of each type of a Pokemon in our data set and creating a baseline model that just predicted the most common pokemon in our data set. Our justification is that we need a baseline accuracy to understand how good our model is for predicting the type of a Pokemon based on other characteristics. The concepts we mainly used were the probability axioms and some of Bayes or marginalization to consider if there was a better baseline model we could use. We used this module during our data analysis and plan to use it in the final report stage.

Part 4: Preliminary Results and Methods (15 points)

The preliminary results section of your report should summarize the results obtained so far in the project. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with a written explanation of the significance of the results with respect to the research questions outlined in the previous section. Please note that a screenshot of your dataset does not count as a table or figure and should not be included in your Prototype (i.e. don’t screenshot the dataframe itself, try to come up with some preliminary visualizations). Instead, if your primary progress is gathering and cleaning your data, provide a table with descriptive statistics about your data. But, make sure that this table is not your only figure generated for this milestone — you should have at least begun the analysis for your research questions at this point, so try to think of ways to communicate your work through data visualizations! Your results do not need to be final or conclusive for your entire project but should demonstrate substantial effort and progress and should provide concrete proof of concept or initial analysis with respect to your research questions. Any tables and figures you generate should also be accompanied by figure labels and captions (i.e. Fig. 1: Linear Regression of XYZ Data).

Your results should be specific about exactly what data were used and how the results were generated. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying, “we cleaned the data and dealt with missing values,” is not sufficient detail, for example.

Your report itself should include an explanation of your methods, but it should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, GitLab repo, or whatever other platform your group is using to house your data and code. Please make sure these links are accessible to users who are not added to the resources directly so UTAs and teaching staff can access these if needed.

Grading

  • E (Exemplary, 15pts) – Preliminary results are thoroughly discussed using labeled tables and figures followed by written descriptions. Specific explanation of how the results were generated and from what data. Link to code/data to create charts or visualizations is provided. 
  • S (Satisfactory, 14pts) – Preliminary results are thoroughly discussed using labeled tables and figures followed by written descriptions. Explanation of how the results were generated may lack some specification or it is somewhat unclear as to what data the results are from. Link provided.
  • N (Not yet, 9pts) – Preliminary results are discussed using tables with missing labels or lacking written descriptions. It is unclear how the results were generated and from what data.
  • U (Unassessable, 3pts) – Preliminary results are missing or do demonstrate meaningful effort.

Part 5: Reflection and Next Steps (10 points)

In this part, you should answer the following sections in their own subsection (if space is limited, how you create the clear subsections is up to you):

  1. Successes/Mostly Complete – What has been successful in the project so far or what is essentially complete and ready for the final report?
  2. Challenges/Incomplete – What has been challenging in the project so far or what is incomplete in the prototype that needs to be finished for the final report?
  3. Collaboration plan reflection – How is the collaboration going? What is currently happening versus the original proposed plan? Is the group okay with what is happening? Does the group need to renegotiate what the plan should be? If yes, what is the new plan?
  4. Next Steps – What are your next steps? These should be concrete and specific actions that your group will take to address the challenges identified in order to complete a successful final project.

Grading

  • E (Exemplary, 10pts) – All four parts are present and the reflection is comprehensive on successes and challenges so far, a reflection on their collaboration plan, and a specific plan of action to address any concerns and future work.
  • S (Satisfactory, 9pts) – All four parts are present and the reflection is comprehensive on successes and challenges so far, but the collaboration plan is weak and there is only a loose plan of action to address any concerns and future work.
  • N (Not yet, 6pts) – A reflection/plan that does not entirely answer 1 or 2 of the questions above.
  • U (Unassessable, 2pts) – A reflection/plan that does not entirely answer 3 of the questions above.

Checklist Before You Submit:

  1. Does your prototype satisfy all general directions?
    1. 3-4 pages in length
    2. Standard margins (1 in.)
    3. Font size is 11-12 pt
    4. Line spacing is 1-1.5
    5. Final document is a pdf
  2. Do you have a descriptive project title, Introduction and clearly stated Research Question(s)?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  3. Have you properly specified/cited one or more specific Data Sources and justified why they are relevant to the research Questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  4. Did you state at least 3 Modules to be used and how, as well as a justification of which concepts will be used at specific stages of the project?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  5. Have you reported all of your Preliminary Results and Methods, including a specific explanation of how the results were generated?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  6. Have you included all relevant tables and figures with appropriate labeling and captions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  7. Have you written a comprehensive reflection?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?

Project: Proposal

Due: Friday, October 11th (late due Saturday, October 12th)

To see an example proposal, you can find it in the class Box folder called Projects.

General Directions

The purpose of this document is to prepare your team for success in the course project. You should have feedback from your Initial Plan on the different research topics you have explored and are now introducing your chosen topic.  Your proposal should contain at least three parts, which we define below. In terms of length, it should be 1.5-3 pages (2 pages is typical) using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). In addition to these three components, you should provide any additional context or information necessary to understand your vision for your project. You should convert your final document to a PDF and upload it to Gradescope under the assignment “Project Proposal” by the due date. Be sure to include your names and NetIds in your final document and use the group submission feature on Gradescope to include all of your group members in a single submission.

The proposal is out of 100 points. Meeting basic formatting requirements is worth 40 points and will be graded as follows:

  • E (Exemplary, 40pts) – Work that meets all requirements. All names and netIDs of group members present
  • S (Satisfactory, 38pts) – Work that meets most requirements. Some names and/or netIDs are missing.
  • N (Not yet, 24pts) – Does not meet all requirements.
  • U (Unassessable, 8pts) –  Missing at least one section.

Part 1: Introduction and Research Questions (20 points)

Your proposal should begin by introducing your topic in general and then defining one or more research questions. Research questions are the guiding questions you want to answer or problems you want to solve in your project. If you are unsure of how to come up with strong research questions for this project or want some more direction, you can look over our guide for formulating a research question page. Your research question(s) should be (1) substantial, (2) feasible, and (3) relevant.

  1. Substantial research questions require more than a surface-level analysis (more than just computing basic summary statistics on readily available datasets, for example).
  2. Feasible research questions can actually be addressed by four or five team members over the course of approximately six weeks using data you can access.
  3. Relevant research questions address a subject of importance and interest within the scientific community or broader society. Additionally, we are looking for why your group believes this research project is worthwhile to your time in this course. 

You should provide a brief justification of your research question(s) with respect to each of these three points. We recommend clearly marking this section by bolding the words substantial, feasible, and relevant when you provide your justification.

Remember to review the feedback you received from your Initial Plan and decide on a topic/research questions that meet the criteria above and sparks interest in your group. This is a project that you will be working on for a significant portion of the semester. 

Grading

  • E (Exemplary, 20pts) – Comprehensive introduction with clearly labeled research questions. It includes a justification for the research questions about whether they are substantial, feasible, and relevant. The justification is reasonable and clear in relevance to a CS216 project.
  • S (Satisfactory, 19pts) – Comprehensive introduction with clearly labeled research questions. It includes a justification for the research questions about whether they are substantial, feasible, and relevant. The justification is clearly missing in terms of clarity or reasonableness in relevance to a CS216 project.
  • N (Not yet, 12pts) – Incomplete introduction where the research questions or justification are missing pieces, but at least some of it is present, or the justification is clearly not reasonable.
  • U (Unassessable, 4pts) – Incomplete introduction where it is entirely missing the research questions or justification or does not demonstrate meaningful effort.

Part 2: Data Sources (20 points)

Your project should deal with real data. We provide pointers to some data sources in the Project Ideas section of the group formation post, but you are welcome and encouraged to look for your own data sources. After your introduction and research questions, your proposal should discuss the data you will use to answer your research questions. Be as specific as possible: name the datasets you will use and how you will access them or specify where you will look for the relevant datasets and why you expect to be successful in finding them. You should also briefly justify why the data you plan to obtain will be relevant and appropriate for addressing your research questions. Searching for data sources as you refine your research questions is likely to be the most time-consuming part of preparing your proposal and is crucial for a good start on your project, so do not put it off.

Grading

  • E (Exemplary, 20pts) – Origins of data or methods to acquire data are properly specified, cited, and relevant to answering the research question(s). If the data is not already available, the justification for why they expect they will have access to it soon is reasonable. (a.k.a. We are reasonably confident you’ll be able to get the data you need for your research questions.)
  • S (Satisfactory, 19pts) – Origins of data or methods to acquire data are properly specified and cited. However, the justification is not clear as to why the data is relevant to the proposed research question(s) OR the justification of why they expect they will have access to the data is not reasonable. (a.k.a. We are not entirely sure you’ll be able to get the data you need for your research questions.)
  • N (Not yet, 12pts) – Poorly specified data sources or methods to acquire data OR the justification for using that data set or the methods to acquire the data is lacking.
  • U (Unassessable, 4pts) – Data sources or methods to acquire data are missing or do not demonstrate meaningful effort.

Part 3: What Modules Are You Using? (20 points)

Your project should utilize concepts from modules we have/will cover in this course to answer your research question(s). We will assume you will use the skills you have acquired from modules 1 (Python), 2 (Numpy/Pandas), and 5 (Probability). This section should state at least 3 more modules that you will utilize for your project. Each module should have a short description of how you will use the knowledge in this module and a justification for that use. In addition, include what concepts from the module you will use and at what stage of your project you plan to mostly use this module. Potential stages include, but are not limited to: data gathering, data cleaning, data investigation, data analysis, and final report.

  • Module 3: Visualization
  • Module 4: Data Wrangling
  • Module 6: Combining Data
  • Module 7: Statistical Inference
  • Module 8: Prediction & Supervised Machine Learning
  • Module 9: Databases and SQL
  • Module 10: Deep Learning

When the proposal is due, you may have not yet learned material from some of the modules above. In this case, you should still provide the modules that are applicable with a description of what concepts you believe will be covered in this section that will be useful to answer your research question.

If you do not plan to use Python, NumPy, and pandas for your project, you must state this and explain why you are choosing not to. It is okay to use something else, like R, but keep in mind that the teaching staff may not have the skills to support you.

Grading

  • E (Exemplary, 20pts) – States at least 3 modules. For each module they provide a (1) short description of how they will use the module, (2) justification for using this module, (3) what concepts they will likely use, and (4) what stage they expect they will use it.
  • S (Satisfactory, 19pts) – States at least 3 modules, but there are some weaknesses somewhere, such as one module as 3 or more parts not well fleshed out or across all 3 modules one part is weak.
  • N (Not yet, 12pts) – States at 3 modules, but 3 or more parts are entirely missing or basically non-existent out of 12 = 4 parts X 3 modules.
  • U (Unassessable, 4pts) – Does not meet the Not Yet criteria, such as having fewer than 3 modules or missing more than 3 parts across all 12 = 4 parts X 3 modules.

Example:

Here is an example justification for Module 5, assuming the project is about creating a prediction model that classifies the data. Remember that this module is not on the list of modules to count as one of your 3, but you are welcome to include analysis using concepts from it. Note the bolding, which will help you ensure you are meeting all requirements and your grader to find them.

Module 5 Probability: We will use this module to calculate the accuracy of a baseline version of the model we will build. We will do this by considering the proportion of the label we are trying to predict, as well as taking into account some of the independent variables. Our justification is that we need a baseline accuracy to understand how good our model is. The concepts we will mainly use are the probability axioms and maybe some of Bayes or marginalization to calculate this baseline. We plan to use this module during the data analysis and final report stage.

Checklist Before You Submit:

  1. Does your proposal satisfy all general directions?
    1. 1.5-3 pages in length
    2. Standard margins (1 in.)
    3. Font size is 11-12 pt
    4. Line spacing is 1-1.5
    5. Final document is a PDF
    6. Names and netIDs
  2. Do you have an Introduction and clearly stated Research Question(s)?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  3. Have you properly specified/cited one or more specific Data Sources or methods to acquire data and justified why they are relevant to the Research Questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  4. Did you state at least 3 Modules to be used and how, as well as a justification of which concepts will be used at specific stages of the project?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?