Project Report
Due: Friday, 12/8
Demo report available in box (Project Resources folder)
If you want written feedback beyond the rubric markings, fill out this form. If you do not fill out the form, we will assume you do not want written feedback.
General Directions
The final report is intended to provide a comprehensive account of your collaborative course project in data science. The report should demonstrate your ability to apply the data science skills you have learned to a real-world project in a holistic way from posing research questions and gathering data to analysis, visualization, interpretation, and communication. The report should stand on its own so that it makes sense to someone who has not read your proposal or prototype.
The report should contain at least the parts defined below. In terms of length, it should be 5-7 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 3-4 pages of text and 5-7 pages overall with tables and figures. It is important to stay within the page limit, as practicing being succinct is an important skill. Your final report should also have a descriptive title, not “CS216 Project Report”. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Project Final Report” by the due date. Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see the Results and Methods section below for more details).
In general, your approach to this report should be to write as if you had “planned this as your project all along.” A report is not a chronological story of your project. It is a summary of what you did where the “story” serves the reader’s comprehension.
Grading
- E (Exemplary, 20pts) – Work that meets all requirements in terms of formatting and sections.
- S (Satisfactory, 19 pts) – Work that meets all requirements but is over 7 pages.
- N (Not yet, 12pts) – Does not meet all requirements.
- U (Unassessable, 4pts) – Missing at least one section.
Part 1: Introduction and Research Questions (15 points)
Your final report should begin by introducing your topic and restating your research question(s) as in your proposal. As before, your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. In contrast to the prior reports, the final report does not need to explicitly justify that the research questions are substantial and feasible in the text; your results should demonstrate both of these points. Therefore, you should remove that text to save space.
You should still explicitly justify how your research questions are relevant. In other words, be sure to explain the motivation of your research questions. Remember that relevant research questions address a subject of importance and interest within the scientific community or broader society. Additionally, we are looking for why your group believes this research project is worthwhile to your time in this course.
You can start with the text from your prototype, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. You should not state specific updates, rather, write the report as the final product and the prior milestones do not exist. Pretend the readers are unaware of the prior milestones. If you feel like an explanation of changes since the prototype is warranted, place that in the appendix. Your introduction should be sufficient to provide context for the rest of your report.
Grading
- E (Exemplary, 15pts) – Comprehensive introduction with clearly labeled, up-to-date research questions and a justification for how the research questions are relevant. Report introduction can stand alone without references to prior versions of the project; no text for explicit justifications for “substantial” and “feasible” are made for the research question(s).
- S (Satisfactory, 14pts) – Comprehensive introduction with clearly labeled, updated research questions and a justification for how the research questions are relevant. The introduction and research questions may have not been refined from the prototype (they have still kept reasoning for why their research questions are substantial and feasible).
- N (Not yet, 9pts) – Incomplete introduction where the research questions or justification are missing pieces, but at least some of it is present. Or the justification is clearly not reasonable.
- U (Unassessable, 3pts) – Incomplete introduction where it is entirely missing the research questions or justification or does not demonstrate meaningful effort.
Part 2: Data Sources (15 points)
Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from / how they were prepared. You can begin with the text from your prototype, but be sure to update it to fit the vision for your final project.
Grading
- E (Exemplary, 15pts) – Origins of data are properly specified, cited, and relevant to answering the research question(s). If any significant data wrangling, cleaning, or other data preparation was done, these processes are explained.
- S (Satisfactory, 14pts) – Origins of data are properly specified and cited. However, the justification is not clear why the data is relevant to the proposed research question(s). If any significant data wrangling, cleaning, or other data preparation was done, these processes are explained.
- N (Not yet, 9pts) – Poorly specified data sources or the justification for using that data set or the methods to acquire the data is lacking. No discussion of preparing the dataset.
- U (Unassessable, 3pts) – Data sources or methods to acquire data are missing or do not demonstrate meaningful effort.
Part 3: What Modules Are You Using? (15 points)
Your project should utilize concepts from modules we have covered in this course to answer your research question(s). We will assume you will use modules 1 (Python), 2 (Numpy/Pandas), and 5 (Probability). Your final report should state at least 3 more modules that you have utilized for your project. Each module should have a short description of how you used the knowledge in this module and a justification for that use. In addition, include what concepts from the module you used and at what stage of your project you mostly used this module. Potential stages include, but are not limited to: data gathering, data cleaning, data investigation, data analysis, and final report.
- Module 3: Visualization
- Module 4: Data Wrangling
- Module 6: Combining Data
- Module 7: Statistical Inference
- Module 8: Prediction & Supervised Machine Learning
- Module 9: Databases and SQL
- Module 10: Deep Learning
Your overall report should clearly show that you used the modules discussed in this section. You should add any additional modules used and update the existing modules to be more specific to the different tasks and stages of your projects that changed since your prototype.
Grading
- E (Exemplary, 15pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used, (4) what stage they used it, and (5) clearly implemented it in the final report.
- S (Satisfactory, 14pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used and (4) what stage they used it. Less than 3 modules are clearly implemented in the final report.
- N (Not yet, 9pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used and (4) what stage they used it. Only one module is clearly implemented in the final report.
- U (Unassessable, 3pts) – Does not meet the Not Yet criteria.
Part 4: Results and Methods (15 points)
This is likely to be the longest section of your paper at multiple pages. The results and methods section of your report should explain your detailed results and the methods used to obtain them. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with written explanations of the significance of the results with respect to the research questions outlined previously. Please note that a screenshot of your dataset does not count as a table or figure and should not be included in your final report.
Your description of your methods should be specific. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying “we cleaned the data and dealt with missing values” or “we built a predictive model” is not sufficient detail.
Your report should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, GitLab repo, or whatever other platforms your group is using to house your data and code.
Grading
- E (Exemplary, 15pts) – Results are thoroughly discussed using labeled tables or figures followed by written descriptions. Specific explanation of how the results were generated and from what data. Link to code/data to create charts or visualizations is provided.
- S (Satisfactory, 14pts) – Results are thoroughly discussed using labeled tables or figures followed by written descriptions. Explanation of how the results were generated may lack some specification or it is somewhat unclear as to what data the results are from. Link provided.
- N (Not yet, 9pts) – Results are discussed using tables with missing labels or lacking written descriptions. It is unclear how the results were generated and from what data.
- U (Unassessable, 3pts) – Results are missing or do not demonstrate meaningful effort.
Part 5: Limitations and Future Work (10 points)
In this part, you should discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.
Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed reader (such as a peer in the class) could improve on and extend your results.
Grading
- E (Exemplary, 10pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Brief discussion of future work and how results could be extended and improved upon.
- S (Satisfactory, 9pts) – Discussion of important limitations and caveats to results could be improved or the discussion of future work and how results could be extended and improved upon lacks some specification.
- N (Not yet, 6pts) – Incomplete discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
- U (Unassessable, 2pts) – Limitations and future work are missing or do not demonstrate meaningful effort.
Part 6: Conclusion (5 points)
Provide a brief (one or two paragraphs) summary of your results. This summary of results should address all of your research questions.
Example
If one of your research questions was “Did COVID-19 result in bankruptcy in North Carolina during 2020?” then a possible (and purely hypothetical) summary of results might be:
We aggregate the public records disclosures of small businesses in North Carolina from January 2019 to December 2020 and find substantial evidence that COVID-19 did result in a moderate increase in bankruptcy during 2020. This increase is not geographically uniform and is concentrated during summer and fall 2020. We also examined the impact of federal stimulus but cannot provide an evaluation of its impact from the available data.
Grading
- E (Exemplary, 10pts) – Research questions are clearly and completely addressed through a summary of results.
- S (Satisfactory, 9pts) – Research questions are clearly addressed through a summary of results. The results may be lacking in completely answering the research questions.
- N (Not yet, 6pts) – Research questions are somewhat addressed through a summary of results. The results are lacking in completely answering the research questions. Or the results of one of the research questions is missing.
- U (Unassessable, 2pts) – Conclusion is missing or does not demonstrate meaningful effort.
(Optional) Part 7: Appendix of additional figures, tables, and updates summary.
If you are struggling to keep your report within the 5-7 page limit, you may move some (not all) of your figures and tables to an optional appendix that will not count against your page limit. However, your report should stand on its own without the appendix. The appendix is for adding more nuance to your results, not to give you more space to talk about your results. Succinctness is an important skill to practice when doing data science. Your grader is not expected to look at the appendix when grading.
If you strongly feel like a summary of project updates since the proposal is required, you may put them in this appendix as well and mention they are in the appendix in the introduction.
Checklist Before You Submit:
- Does your final report satisfy all general directions?
- 5-7 pages in length
- Standard margins (1 in.)
- Font size is 11-12 pt
- Line spacing is 1-1.5
- Final document is a pdf
- Do you have an Introduction and clearly stated Research Question(s)?
- Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
- Have you properly specified/cited one or more specific Data Sources and justified why they are relevant to the research Questions?
- Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
- Did you state at least 3 Modules to be used and how, as well as a justification of which concepts will be used at specific stages of the project?
- Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
- Have you reported all of your Results and Methods, including a specific explanation of how the results were generated?
- Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
- Have you defined clear Limitations to your results and Future Work?
- Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
- Have you written a comprehensive Conclusion?
- Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?