Month: November 2023

(In-person) Exam Retakes

This post outlines what the in-person exam retakes will be like.

Tell us you are coming by filling out the retake exam form (help save paper)

  • When: Thursday 11/14, 2-5 pm
    • Exam 1 Retake: 2-2:55 pm
    • Exam 2 Retake: 3-3:55 pm
    • Exam 3 Retake: 4-4:55 pm
    • (We will likely have a standard deviation of 5 minutes around these times, but we will not start early)
  • Where: Physics 130 (usual lecture room)
  • There are three in-person exam retakes. One for each of the midterm exams.
  • Each retake will cover only the original modules covered by that exam. See the calendar or the original exam logistics posts for that information.
  • You are allowed one helper sheet per exam. You can only have one out per exam if you bring multiple to use on different exams.
  • Each exam will be 55 minutes long.
    • If you have an SDAO accommodation, you need to schedule a time with the testing center. Multiply 55 minutes with your extra time and the number of exams you plan to retake. We will handle merging the exam pdfs into a single “exam” retake.
  • The exams have been scaled down regarding the number of questions to accommodate this new time limit. The points were redistributed so that the assessed concepts are still worth about the same.
  • There will be no regrade window due to the necessary turnaround time for submitting grades. If you wish to discuss your grade, you must email Prof. Stephens-Martinez.

Specific information

  • Exam 1 Retake
  • Exam 2 Retake
    • Bring your calculator. You can have it out for Exam 1 and 3 Retake, but it is not required.
    • We will provide a reference sheet again.
    • See Exam 2 logistics for any other information.
  • Exam 3 Retake
    • Bring a calculator if you want to.
    • The front of the exam will have the precision, recall, and accuracy formulas for your reference.
    • See Exam 3 logistics for any other information.

Module 10: Deep Learning

  1. Prepare (due Monday 11/27)
    1. Content below
    2. Canvas quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sunday 12/3) [Link]
  4. There are no worked examples

Content (Box)

10 Deep Learning

  1. Neural Networks and Applications (16 min.)
  2. Forward Propagation (10 min.)
  3. Gradient Descent (14 min.)
  4. Back Propagation (11 min.)
  5. Convolutional Neural Network (15 min.)
  6. Introducing Pytorch (23 min.)

Optional Supplements

Pytorch

Unlike most other libraries for this course, Pytorch is not included in the basic Anaconda installation. To use Pytorch, we suggest you choose one of two options.

  • Install Pytorch locally (for free). You can see the directions on the website: Select the stable build, your operating system, Conda (for Anaconda), Python, and CPU to see install directions for your particular setup. (CUDA is used to support hardware acceleration with NVIDIA graphics cards and is not necessary for this course).
  • Use Pytorch in a Jupyter notebook in the cloud (also for free). The easiest way to do this if you have a Google account is with a Google colab notebook; Pytorch will already be available to you in this cloud environment.

You can find the official Pytorch documentation here. Of particular note are the Pytorch tutorials, including Pytorch recipes which serve as small examples of common tasks.

Book

The deep learning book is available free online and is authored by some of the leading experts in machine learning with deep artificial neural networks. It is very detailed and in-depth and is purely for those who are interested in learning more about deep learning theory now or in the future; you do not need to read the book for this course.

Project: Video Submission

Project Video Presentation

Due: Friday, 12/8

Demo slides available in box (Project Resources folder)

If you want written feedback beyond the rubric markings, fill out this form. If you do not fill out the form, we will assume you do not want written feedback.

General Directions

The project video presentation is intended to provide a high-level overview of your project to an audience of your peers (that is, individuals who have a reasonable knowledge of data science but are not experts in your particular project topic). Presentation recordings will be made available to the entire class (through Sakai, so not available outside of the class). The presentation should demonstrate your ability to communicate the significance and interpret the findings of your research project. The presentation should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

Your group should create a video recording of your presentation in which every group member speaks and in which you use a visual aid such as presentation slides. The easiest way to do this is to simply hold a zoom call with all members of your project group, share your screen with your presentation slides, and record either locally or to the cloud (see Zoom recording help information). If this is not possible, you can also record portions individually and combine the recordings (though this will require additional editing work). In the end, we will ask for a URL to your complete recording, so you can either provide a share link to a zoom cloud recording or you can record locally and then upload your recording to Duke Box, Warpwire, or any other cloud platform that we can access such that we can view your recording directly online (we should not need to download to view the recording). Ensure that anyone with the link can view the recording.

In terms of length, the presentation should be between 8 and 12 minutes. You can have as many slides as are necessary, but a typical pace has 1-2 slides per minute, so 8-24 slides total would be reasonable. Your slides should prioritize well-labeled figures or visualizations and use text sparingly to emphasize important points. The text should also be large enough that it is reasonably easy to read. When you are finished, you will submit a pdf of your slides to Gradescope under the assignment “Project Video Presentation.” Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope.  Your first slide should include the URL where we can view the recording of your presentation.

Grading

  • E (Exemplary, 10pts) – Video presentation is between 8 and 12 minutes.
  • S (Satisfactory, 9 pts) – Video presentation is over 12 minutes.
  • N (Not yet, 6pts) – Video presentation does not reach 8 minutes.
  • U (Unassessable, 2pts) –  Video presentation is missing or does not demonstrate meaningful effort.

Part 0: Title Slide

The very first slide of your presentation should be a title slide containing at least the below information. It does not need to be in the actual video recording.

  • A descriptive title of your project/presentation, not “CS216 Presentation Video”
  • Names of all group members
  • URL to the video recording of your presentation

Grading

  • E (Exemplary, 10pts) – Work that meets all requirements.
  • N (Not yet, 6pts) – Does not meet all requirements.

Part 1: Introduction and Research Questions

Your presentation should begin by introducing your topic generally and posing your research questions. Provide some explanation of the relevance or motivation of your research questions.

Grading

  • E (Exemplary, 20pts) – General introduction to topic and clearly defined research questions and their motivations.
  • S (Satisfactory, 19pts) – General introduction to topic and clearly defined research questions. Discussion of motivations may be missing.
  • N (Not yet, 12pts) – General introduction to topic. Research questions and motivations are not clearly defined.
  • U (Unassessable, 4pts) – Introduction and research questions are missing or do not demonstrate meaningful effort.

Part 2: Data Sources

Discuss the data you collected and used to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from/how they were prepared.

Grading

  • E (Exemplary, 20pts) – Origins of data are properly specified, cited, and include discussion of what information they contain. Any relevant data wrangling, cleaning, or other data preparation is explained.
  • S (Satisfactory, 19pts) – Origins of data are properly specified, cited, and include discussion of what information they contain. Any relevant data wrangling, cleaning, or other data preparation may be missing or could be improved.
  • N (Not yet, 12pts) – Poorly specified data sources and lack of discussion of preparing the dataset.
  • U (Unassessable, 4pts) – Discussion of data sources and data preparation are missing or do not demonstrate meaningful effort.

Part 3: Results

Describe your results. Where possible, provide well-labeled and legible charts/figures in your slides to summarize results instead of verbose text. Interpret the results in the context of your research questions. It may not be possible to describe every individual result from your project in a brief amount of time. Focus on the most important and essential results for addressing your research questions. Please note that a screenshot of your dataset does not count as a table or figure and should not be included in your video presentation.

Unlike your final report, it is not generally possible to describe your methods in sufficient detail in a short presentation so that an informed audience member could reproduce your results. Instead, you should focus on your results and their interpretation, and only discuss methods at a high level such as may be necessary to interpret the results.

Example of Interpreting results

Do not: “When we conducted our hypothesis test, we found that p < 0.05, so our results are significant.”

Do: “Since our p-value is significant, we could determine that generation 1 pokemon have a different popularity than all other pokemon. And since the mean popularity of generation 1 is higher than the mean of all the other pokemon, we can conclude that generation 1 is on average more popular.” [The slide shows the p-value]

Grading

  • E (Exemplary, 20pts) – Most important and essential results are thoroughly discussed using labeled tables or figures followed by an interpretation of the results in the context of the research questions. 
  • S (Satisfactory, 19pts) – Results are thoroughly discussed using labeled tables or figures followed by an interpretation of the results in the context of the research questions. Maybe missing an important result that should have been included.
  • N (Not yet, 12pts) – Results are discussed using tables with missing labels or lacking interpretation in the context of the research questions.
  • U (Unassessable, 4pts) – Results are missing or do not demonstrate meaningful effort. 

Part 4: Limitations and Future Work

You should briefly discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed audience member (such as a peer in the class) could improve on and extend your results.

Grading

  • E (Exemplary, 20pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Brief discussion of future work and how results could be extended and improved upon.
  • S (Satisfactory, 19pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
  • N (Not yet, 12pts) – Incomplete discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
  • U (Unassessable, 4pts) – Limitations and future work are missing or do not demonstrate meaningful effort.

Checklist Before You Submit:

  1. Is your video presentation between 8 and 12 minutes in length?
  2. Does your first slide satisfy all requirements?
    1. A title of your project/presentation
    2. Names of all group members
    3. URL to the video recording of your presentation
  3. Do you have an Introduction and clearly stated Research Question(s)?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  4. Have you properly specified/cited one or more specific Data Sources and justified why they are relevant to the research Questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  5. Have you reported all of your important Results, including an interpretation of them in the context of the research questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  6. Have you defined clear Limitations to your results and Future Work?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?

 

Project: Final Report

Project Report

Due: Friday, 12/8

Demo report available in box (Project Resources folder)

If you want written feedback beyond the rubric markings, fill out this form. If you do not fill out the form, we will assume you do not want written feedback.

General Directions

The final report is intended to provide a comprehensive account of your collaborative course project in data science. The report should demonstrate your ability to apply the data science skills you have learned to a real-world project in a holistic way from posing research questions and gathering data to analysis, visualization, interpretation, and communication. The report should stand on its own so that it makes sense to someone who has not read your proposal or prototype.

The report should contain at least the parts defined below. In terms of length, it should be 5-7 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 3-4 pages of text and 5-7 pages overall with tables and figures. It is important to stay within the page limit, as practicing being succinct is an important skill. Your final report should also have a descriptive title, not “CS216 Project Report”. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Project Final Report” by the due date. Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see the Results and Methods section below for more details).

In general, your approach to this report should be to write as if you had “planned this as your project all along.” A report is not a chronological story of your project. It is a summary of what you did where the “story” serves the reader’s comprehension.

Grading

  • E (Exemplary, 20pts) – Work that meets all requirements in terms of formatting and sections.
  • S (Satisfactory, 19 pts) – Work that meets all requirements but is over 7 pages.
  • N (Not yet, 12pts) – Does not meet all requirements.
  • U (Unassessable, 4pts) –  Missing at least one section.

Part 1: Introduction and Research Questions (15 points)

Your final report should begin by introducing your topic and restating your research question(s) as in your proposal. As before, your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. In contrast to the prior reports, the final report does not need to explicitly justify that the research questions are substantial and feasible in the text; your results should demonstrate both of these points. Therefore, you should remove that text to save space.

You should still explicitly justify how your research questions are relevant. In other words, be sure to explain the motivation of your research questions. Remember that relevant research questions address a subject of importance and interest within the scientific community or broader society. Additionally, we are looking for why your group believes this research project is worthwhile to your time in this course.

You can start with the text from your prototype, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. You should not state specific updates, rather, write the report as the final product and the prior milestones do not exist. Pretend the readers are unaware of the prior milestones. If you feel like an explanation of changes since the prototype is warranted, place that in the appendix. Your introduction should be sufficient to provide context for the rest of your report.

Grading

  • E (Exemplary, 15pts) – Comprehensive introduction with clearly labeled, up-to-date research questions and a justification for how the research questions are relevant. Report  introduction can stand alone without references to prior versions of the project; no text for explicit justifications for “substantial” and “feasible” are made for the research question(s).  
  • S (Satisfactory, 14pts) – Comprehensive introduction with clearly labeled, updated research questions and a justification for how the research questions are relevant. The introduction and research questions may have not been refined from the prototype (they have still kept reasoning for why their research questions are substantial and feasible).
  • N (Not yet, 9pts) – Incomplete introduction where the research questions or justification are missing pieces, but at least some of it is present. Or the justification is clearly not reasonable.
  • U (Unassessable, 3pts) – Incomplete introduction where it is entirely missing the research questions or justification or does not demonstrate meaningful effort.

Part 2: Data Sources (15 points)

Discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using, the information they contain, and where they were collected from / how they were prepared. You can begin with the text from your prototype, but be sure to update it to fit the vision for your final project.

Grading

  • E (Exemplary, 15pts) – Origins of data are properly specified, cited, and relevant to answering the research question(s). If any significant data wrangling, cleaning, or other data preparation was done, these processes are explained.
  • S (Satisfactory, 14pts) – Origins of data are properly specified and cited. However, the justification is not clear why the data is relevant to the proposed research question(s). If any significant data wrangling, cleaning, or other data preparation was done, these processes are explained.
  • N (Not yet, 9pts) – Poorly specified data sources or the justification for using that data set or the methods to acquire the data is lacking. No discussion of preparing the dataset.
  • U (Unassessable, 3pts) – Data sources or methods to acquire data are missing or do not demonstrate meaningful effort.

Part 3: What Modules Are You Using? (15 points)

Your project should utilize concepts from modules we have covered in this course to answer your research question(s). We will assume you will use modules 1 (Python), 2 (Numpy/Pandas), and 5 (Probability).  Your final report should state at least 3 more modules that you have utilized for your project. Each module should have a short description of how you used the knowledge in this module and a justification for that use. In addition, include what concepts from the module you used and at what stage of your project you mostly used this module. Potential stages include, but are not limited to: data gathering, data cleaning, data investigation, data analysis, and final report.

  • Module 3: Visualization
  • Module 4: Data Wrangling
  • Module 6: Combining Data
  • Module 7: Statistical Inference
  • Module 8: Prediction & Supervised Machine Learning
  • Module 9: Databases and SQL
  • Module 10: Deep Learning

Your overall report should clearly show that you used the modules discussed in this section. You should add any additional modules used and update the existing modules to be more specific to the different tasks and stages of your projects that changed since your prototype.

Grading

  • E (Exemplary, 15pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used, (4) what stage they used it, and (5) clearly implemented it in the final report. 
  • S (Satisfactory, 14pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used and (4) what stage they used it. Less than 3 modules are clearly implemented in the final report. 
  • N (Not yet, 9pts) – States at least 3 modules. For each module, they provide an updated (1) short description of how they used the module, (2) justification for using this module, (3) what concepts they used and (4) what stage they used it. Only one module is clearly implemented in the final report.
  • U (Unassessable, 3pts) – Does not meet the Not Yet criteria.

Part 4: Results and Methods (15 points)

This is likely to be the longest section of your paper at multiple pages. The results and methods section of your report should explain your detailed results and the methods used to obtain them. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with written explanations of the significance of the results with respect to the research questions outlined previously. Please note that a screenshot of your dataset does not count as a table or figure and should not be included in your final report.

Your description of your methods should be specific. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying “we cleaned the data and dealt with missing values” or “we built a predictive model” is not sufficient detail.

Your report should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, GitLab repo, or whatever other platforms your group is using to house your data and code.

Grading

  • E (Exemplary, 15pts) – Results are thoroughly discussed using labeled tables or figures followed by written descriptions. Specific explanation of how the results were generated and from what data. Link to code/data to create charts or visualizations is provided. 
  • S (Satisfactory, 14pts) – Results are thoroughly discussed using labeled tables or figures followed by written descriptions. Explanation of how the results were generated may lack some specification or it is somewhat unclear as to what data the results are from. Link provided.
  • N (Not yet, 9pts) – Results are discussed using tables with missing labels or lacking written descriptions. It is unclear how the results were generated and from what data.
  • U (Unassessable, 3pts) – Results are missing or do not demonstrate meaningful effort.

Part 5: Limitations and Future Work (10 points)

In this part, you should discuss any important limitations or caveats to your results with respect to answering your research questions. For example, if you don’t have as much data as you would like or are unable to fairly evaluate the performance of a predictive model, explain and contextualize those limitations.

Finally, provide a brief discussion of future work. This could explain how future research might address the limitations you outline, or it could pose additional follow-up research questions based on your results so far. In short, explain how an informed reader (such as a peer in the class) could improve on and extend your results.

Grading

  • E (Exemplary, 10pts) – Comprehensive and explicit discussion of important limitations and caveats to results. Brief discussion of future work and how results could be extended and improved upon.
  • S (Satisfactory, 9pts) – Discussion of important limitations and caveats to results could be improved or the discussion of future work and how results could be extended and improved upon lacks some specification.
  • N (Not yet, 6pts) –  Incomplete discussion of important limitations and caveats to results. Discussion of future work and how results could be extended and improved upon may lack some specification.
  • U (Unassessable, 2pts) – Limitations and future work are missing or do not demonstrate meaningful effort.

Part 6: Conclusion (5 points)

Provide a brief (one or two paragraphs) summary of your results. This summary of results should address all of your research questions.

Example

If one of your research questions was “Did COVID-19 result in bankruptcy in North Carolina during 2020?” then a possible (and purely hypothetical) summary of results might be:

We aggregate the public records disclosures of small businesses in North Carolina from January 2019 to December 2020 and find substantial evidence that COVID-19 did result in a moderate increase in bankruptcy during 2020. This increase is not geographically uniform and is concentrated during summer and fall 2020. We also examined the impact of federal stimulus but cannot provide an evaluation of its impact from the available data.

Grading

  • E (Exemplary, 10pts) – Research questions are clearly and completely addressed through a summary of results. 
  • S (Satisfactory, 9pts) – Research questions are clearly addressed through a summary of results. The results may be lacking in completely answering the research questions.
  • N (Not yet, 6pts) –  Research questions are somewhat addressed through a summary of results. The results are lacking in completely answering the research questions. Or the results of one of the research questions is missing.
  • U (Unassessable, 2pts) – Conclusion is missing or does not demonstrate meaningful effort.

(Optional) Part 7: Appendix of additional figures, tables, and updates summary.

If you are struggling to keep your report within the 5-7 page limit, you may move some (not all) of your figures and tables to an optional appendix that will not count against your page limit. However, your report should stand on its own without the appendix. The appendix is for adding more nuance to your results, not to give you more space to talk about your results. Succinctness is an important skill to practice when doing data science. Your grader is not expected to look at the appendix when grading.

If you strongly feel like a summary of project updates since the proposal is required, you may put them in this appendix as well and mention they are in the appendix in the introduction.

Checklist Before You Submit:

  1. Does your final report satisfy all general directions?
    1. 5-7 pages in length
    2. Standard margins (1 in.)
    3. Font size is 11-12 pt
    4. Line spacing is 1-1.5
    5. Final document is a pdf
  2. Do you have an Introduction and clearly stated Research Question(s)?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  3. Have you properly specified/cited one or more specific Data Sources and justified why they are relevant to the research Questions?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  4. Did you state at least 3 Modules to be used and how, as well as a justification of which concepts will be used at specific stages of the project?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  5. Have you reported all of your Results and Methods, including a specific explanation of how the results were generated?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  6. Have you defined clear Limitations to your results and Future Work?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?
  7. Have you written a comprehensive Conclusion?
    1. Do you feel as if this part meets the requirements of E (Exemplary) or S (Satisfactory)?

 

Exam 03 Logistics

This post outlines what Exam 2 will be like.

Two different pieces are considered part of the exam. There’s the Practium and the in-person Exam.

Exam General Information

  • Modules covered:
    • 07 Statistical Inference
    • 08 Prediction & Supervised Machine Learning
  • Practice Exam

Practicum

  • When: Wednesday, 11/15, starts 12:01am EST, ends 11:59pm EST
    • It should take your group around 3-4 hours to complete, but your group can take as long as you want. It must be submitted before the end of the day.
  • This will have questions from both modules.
  • Working together as a group are the same as Practicum 02’s logistics.
    • To be clear, collaboration means that help or sharing of answers happens in any direction. Student A and B could have shared answers with each other. Or student A shared answers with student B but did not receive student B’s answers, and vice versa.
  • All other details are the same as Practicum 02’s logistics.

Practicum Update

  • When: Thursday 11/30 – Saturday 12/02
    • Note: This is after Thanksgiving break and during Module 10
  • All other details are the same as Practicum 2 Update’s logistics.

In-person Exam

  • When: Friday 11/17, during regular class time
  • This will have questions from both modules.
  • Calculators are not required unless you want to use them to calculate things like precision and recall precisely when explaining your interpretation of confusion matrices. Such calculations will not be required.
  • The exam does not have a reference sheet. We will give you the formulas for precision, recall, and accuracy on the front page of the exam.
  • All other details are the same as In-person Exam 2’s logistics.

Grading Scale and Points Allocation

This is the same as Exam 1’s logistics.

Module 09: Databases and SQL

  1. Prepare (due Mon 11/6)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 11/12) [LINK]
  4. Worked Example [LINK]

Content

09.A – Predictive Modeling and Regression

  1. Relational Database (24 min.)

09.B – Machine Learning and Classification

  1. SQL Querying (21 min.)
  2. SQL with Python and Pandas (12 min.)

Optional Supplements

Powered by WordPress & Theme by Anders Norén