References and Reading
You should expect to read to deepen your understanding of a given topic after lecture. You will get the most out of the course depending on what you put in, and the course references are rich sources of information beyond the minimum that we can introduce during lecture. Specific suggestions for readings are marked on the schedule.
Homework Assignments
There will be regular weekly homework assignments, each as a set of Jupyter Notebooks including programming in Python and working with data to train, test, apply, and evaluate machine learning models. Homework assignments may also include test-style questions and integrated societal and ethical considerations with readings and short answer writing exercises. See the schedule for due dates.
Collaboration
You can work on homework assignments alone or in a group of two (that is, with a single partner). If working with a partner, you should not split up the assignment, but should actively collaborate (for example by pair-programming, by discussing shared approaches, by reviewing one another’s code, etc.) Working with a partner is optional and we do not assign or match partners. If you work with a partner, both partners will receive the same grade based on the submitted work. You can switch partners between assignments, but once you begin working with a partner for a particular assignment you may not change for that same assignment.
Generative AI/LLM Usage
You may use generative AI (such as DukeGPT or dedicated programming tools like Cursor) to help you complete homework assignments but should conform to the following guidelines of responsible usage to support learning:
- Approved: Asking for conceptual clarification, help debugging and editing, suggesting strategies for implementation, giving examples of API usage, line-level autocomplete features such as are available with Cursor’s tab or Github Copilot.
- Not approved: Uploading entire assignment parts/tasks for AI completion, submitting blocks of code written by AI (more than line-level autocomplete), submitting any AI generated text to short answer questions. In general, agentic modes of programming tools such as Cursor, Github Copilot, or Claude Code are NOT approved, as these are used to generate entire blocks of code with minimal or no human input. These agentic tools may be used for help debugging, but not generating code.
Submission
Homework assignments should be submitted on the course Gradescope, which will be accessible through a link on Canvas. If you work with a partner, your group should make a single submission on gradescope and ensure both members have been added using the group feature.
Exams
There will be two mandatory written in-class exams during the semester (Exam 1 and Exam 2). The exams will be 75 minutes long and are closed book with no notes allowed. Exams will focus on conceptual understanding, critical evaluation, and practical implementations and will be based primarily on lecture and assignment materials.
Each exam has an optional retake opportunity. Retakes are on the same topics with the same structure but have a different set of questions to provide a second opportunity to demonstrate mastery of the material. The retake is optional; if you choose to take it and do better than your original exam score then it will replace that exam score. If you do worse on the retake, it will not change your score (that is, it cannot hurt your grade). The original exam is mandatory — you are only eligible to take the retake if you completed the original exam.
Final Project
At the end of the course, students will complete a final project of their choice using any of the techniques from the course. You might choose to build a web or mobile application powered by AI such, attempt a data science / machine learning competition or hackathon style task, such as the featured challenges regularly posted on Kaggle, or reproduce the results of a research study published in venues like NeurIPS or ICML.
The final project is due by 11:59 pm on Sunday, April 26 (the last day of the reading period prior to the beginning of the final examination period). Note that this course does not have a separate final exam, only the final project.
You are strongly encouraged to use state-of-the-art machine learning models for building your final project. While some projects may necessitate building and training a model completely from scratch, it is much more likely that most projects will be best served by fine-tuning, adapting, integrating, or otherwise working with existing models as a starting point. Similarly, you are strongly encouraged to select a project that can be completed using only publicly available data – that is, that will not require you to collect your own data given the time constraints of the semester.
Final Project AI Usage
You may use generative AI (such as DukeGPT, Cursor, Github Copilot, etc.) to help develop your final project, even including the use of agentic modes for generating blocks of code (unlike for homework assignments).
However, you must attribute any AI-generated code and you are fully responsible for the work you submit (that is, “but that’s what the AI said” would not be a valid defense if one were to lose credit due to errors or inconsistencies in the code). Attribution is provided in both of the following two places:
- ATTRIBUTION.md, described below
- Code comments provided at the function/method docstring, the whole class, or the whole file level, depending on the amount of AI-generated code.
Final Project Collaboration
You can work on the final project alone or in a group of two (that is, with a single partner). Partner projects will be expected to demonstrate more substantial results. For this reason, you should work with a partner if you want to work on a more ambitious project, not just to reduce the amount of work.
Project Workshops
Several class periods near the end of the semester are dedicated as “project workshop.” This is additional dedicated time set aside for you to work on your projects in-class with easy access to instructor and TA discussion, help, and support.
Submission Requirements
The final project should be submitted as a project repository including the following components. You will submit a simple link to this repository on Gradescope.
- Software: This should be a publicly accessible Github or Gitlab repository with all project code, data, and documentation. The repository must follow the organizational structure outlined below to ensure clear presentation of your work and efficient grading.
- Project Demo: This is a 3–5-minute video that shows what your project does and why that matters. Think of this as the video you would show a non-specialist to pitch what you built. There is no reason to show any code in this video – you can use slides with visualizations or diagrams to provide motivation, show the running application, show experimental results, etc. This video should be included in your repository and linked prominently in your README file.
- Technical Walkthrough: This is a 5–10-minute video that shows and discusses how your project works. Think of this as the video you would show a fellow ML engineer to explain how you accomplished what you did. This video should help orient a grader to understand how your code works and where the machine learning concepts are being applied. It should also help a grader understand what was challenging about the project and where the significant technical contributions can be found. Like the project demo, this video should be stored in your repository and clearly linked in your README file.
Repository Organization Requirements
Your project repository must include the following structure and files to facilitate clear evaluation of your work:
- Required Files. The following markdown files should appear in the top level repository.
- Your repository must contain a README.md file that serves as the main project overview and follows the template requirements specified below.
- You must also include a SETUP.md file that provides clear installation and setup instructions for running your project. If your project uses external APIs or services, you should provide clear instructions for graders on how they can test your system.
- Additionally, you must include an ATTRIBUTION.md file that contains detailed attribution of AI-generated code, external libraries, datasets, and any other resources used in your project.
- Directory Structure. Your repository should follow this organizational structure:
- a src directory containing any of your source code that is not in Jupyter notebooks,
- a data directory for any data files or data access scripts,
- a models directory for any trained models, model configurations, or model loading scripts
- a notebooks directory for Jupyter notebooks used for exploration or analysis,
- a videos directory containing your demo and technical walkthrough videos,
- a docs directory for any additional documentation,
- and a txt file or environment.yml file for any dependency management.
- md File Requirements: Your README.md file must include these sections in the specified order. Where appropriate, you should include sample outputs or screenshots in your README file to demonstrate your project’s functionality. Your README.md should be a concise reference and usage guide – your videos should present the primary explanation and discussion.
- Project Title and short (1-3 sentence) description of what your project does,
- a What it Does section that describes in one paragraph what your project does,
- a Quick Start section that concisely explains how to run your project,
- a Video Links section with direct links to your demo and technical walkthrough videos,
- an Evaluation section that presents any quantitative results, accuracy metrics, or qualitative outcomes from testing,
- and an Individual Contributions section for group projects that describes who did what.
Suggestions, Timeline, and Grading
You are not expected to be working on the project during the first half of the semester, though you are encouraged to begin imagining what you might be interested in building — if there are particular data domains of interest, methodologies, competitions, applications, etc. that you might like to work on for personal interest or as part of curating your own professional portfolio.
More detailed suggestions on possible projects and the complete grading rubric will be released as a separate handout by midterm (after the first exam). At this point you should begin concretely planning for your project and discussing ideas with prospective partners, if you are considering working with a partner. You should begin looking for data sources, exploring relevant models and APIs, experimenting, etc. It is very important that you have some concrete ideas for what will do in your project by the time of the second exam.
After the second exam, rest of the semester is dedicated entirely to working on your project, and you should expect to spend a substantial amount of time in and out of class working on your project during the last weeks of the course.