Monthly Archives: October 2021

Module 7: Deep Learning

  1. Prepare (soft due Tu 11/10, hard due M 11/15)
    1. Content below
    2. Sakai quizzes
  2. Group Worksheet (soft due W 11/10, hard due M 11/15)
    1. Part 1
    2. Part 2
    3. Part 3
    4. Part 4
  3. Practice (due M 11/22)
  4. Perform – There is no Perform for this module

Content

7 Deep Learning

  1. Neural Networks and Applications (16 min.)
  2. Forward Propagation (10 min.)
  3. Gradient Descent (14 min.)
  4. Back Propagation (11 min.)
  5. Convolutional Neural Network (15 min.)
  6. Introducing Pytorch (23 min.)

Optional Supplements

The deep learning book is available free online and is authored by some of the leading experts in machine learning with deep artificial neural networks. It is very detailed and in-depth and is purely for those who are interested in learning more about deep learning theory now or in the future; you do not need to read the book for this course.

Unlike most other libraries for this course, Pytorch is not included in the basic Anaconda installation. To use Pytorch, we suggest you choose one of two options.

  • Install Pytorch locally (for free). You can see the directions on the website: Select the stable build, your operating system, Conda (for Anaconda), Python, and CPU to see install directions for your particular setup. (CUDA is used to support hardware acceleration with NVIDIA graphics cards and is not necessary for this course).
  • Use Pytorch in a Jupyter notebook in the cloud (also for free). The easiest way to do this if you have a Google account is with a Google colab notebook; Pytorch will already be available to you in this cloud environment.

You can find the official Pytorch documentation here. Of particular note are the Pytorch tutorials, including Pytorch recipes which serve as small examples of common tasks.

Module 6: Prediction & Supervised Machine Learning

  1. Prepare (soft due Tu 10/26, hard due M 11/1)
    1. Content below, if you are new to machine learning some of the optional is strongly recommended.
    2. Sakai quizzes
  2. Group Worksheet (soft due W 10/27, hard due M 11/1)
  3. Practice (due M 11/8)
  4. Perform (due M 11/22)

Content

6.A Predictive Modeling and Regression

  1. Ordinary Linear Regression and Intro Scikit-Learn (21 min.)
  2. Nonlinear Regression and Scikit-Learn Preprocessing (13 min.)
  3. Binary Classification with Logistic Regression (22 min.)

6.B Machine Learning and Classification

  1. Naïve Bayes and Text Classification (20 min.) – The video has a type on slide 10, see the pdf of the slides in Box for the fix.
  2. K-Nearest Neighbors and Training/Testing (31 min.)

Optional Supplements

Chapter 5 Machine Learning from the Python Data Science Handbook provides a very nice treatment of many of the topics from the above videos and more. If you are new to machine learning, we highly recommend that you read sections 5.1 What is Machine Learning through 5.4 Feature Engineering after completing the videos. After that, you can optionally read any of the In-Depth sections about specific algorithms for prediction.

In addition, the scikit-learn documentation itself provides several resources for working with the library:

Project: Prototype

Due: Monday 11/08, no late period

General Directions

The prototype deliverable is intended to demonstrate a proof of concept for your final project report. Large multi-week projects are challenging, this deliverable is intended to provide additional structure to ensure you are making progress and on a path towards success. It consists of a written report detailed below along with any accompanying data, code, or other supplementary resources that demonstrate your progress so far in the project. You can think of it as a rough draft for your final project. The report should stand on its own so that it makes sense to someone who has not read your proposal.

The report should contain at least three parts, which we define below. In terms of length, it should be about 3-4 pages using standard margins (1 in.), font (11-12 pt), and line spacing (1-1.5). A typical submission is around 2-3 pages of text and 3-4 pages overall with tables and figures. You should convert your written report to a pdf and upload it to Gradescope under the assignment “Project Prototype” by the due date. Be sure to include your names and NetIDs in your final document and use the group submission feature on Gradescope. You do not need to upload your accompanying data, code, or other supplemental resources demonstrating your work to Gradescope; instead, your report should contain instructions on how to access these resources (see part 2 below for more details).

Part 1: Introduction and Research Questions

Your prototype report should begin by reintroducing your topic and restating your research question(s) as in your proposal. Your research question(s) should be (1) substantial, (2) feasible, and (3) relevant. Briefly justify each of these points as in the project proposal. You can start with the text from your proposal, but you should update your introduction and research questions to reflect changes in or refinements of the project vision. Specifically, point out what has changed since the proposal. Your introduction should be sufficient to provide context for the rest of your report.

Part 2: Data Sources

After your introduction and research questions, your prototype should discuss the data you have collected and are using to answer your research questions. Be specific: name the datasets you are using and where they were collected from / how they were prepared. Briefly justify why your data are appropriate and sufficient to address your research questions. As in the introduction, you can begin with the text from your proposal but be sure to update it to fit with your evolving project.

Part 3: Preliminary Results and Methods

The preliminary results section of your report should summarize the results obtained so far in the project. Where possible, results should be summarized using clearly labeled tables or figures and supplemented with a written explanation of the significance of the results with respect to the research questions outlined in the previous section. Your results do not need to be final or conclusive for your entire project but should demonstrate substantial effort and progress and should provide concrete proof of concept or initial analysis with respect to your research questions.

Your results should be specific about exactly what data were used and how the results were generated. For example, if you scraped multiple web databases, merged them, and created a visualization, then you should explain how each step was conducted in enough detail that an informed reader could reasonably be expected to reproduce your results with time and effort. Just saying “we cleaned the data and dealt with missing values” is not sufficient detail, for example.

Your report itself should include an explanation of your methods, but it should also contain instructions on how to access your full implementation (that is, your code, data, and any other supplemental resources like additional charts or tables). The simplest way to do so is to include a link to the box folder, GitLab repo, or whatever other platform your group is using to house your data and code.

Part 4: Reflection and Next Steps

In this part, you should begin by reflecting on the progress of your project so far. Address the following:

  1. What has been successful in the project so far or what is essentially complete and ready for the final report?
  2. What has been challenging in the project so far or what is incomplete in the prototype that needs to be finished for the final report?
  3. What are your next steps? These should be concrete and specific actions that your group will take to address the challenges identified in order to complete a successful final project.

Feedback and Grading Rubric

Prototypes will be evaluated on the following criterion-based rubric. Prototypes satisfying all criteria will receive full credit. Formative feedback (comments and suggestions) will also be provided for each proposal by your project group mentor.

  1. Submits a relevant document satisfying general requirements
  2. Includes a brief introduction to the topic of interest
  3. Poses one or more concrete research questions
  4. Provides a reasonable justification that research questions are substantial
  5. Provides a reasonable justification that research questions are feasible
  6. Provides a reasonable justification that research questions are relevant
  7. Explains how the topic/research questions have (or have not) changed since the proposal
  8. Includes a discussion of concrete/specific data sources
  9. Provides reasonable justification that data sources are appropriate for research questions
  10. Provides some specific preliminary results in the form of analysis, tables, visualization, etc.
  11. Results demonstrate substantial effort and progress toward addressing research questions, do not have to be complete or exhaustive but must demonstrate effort and progress
  12. Methods used to obtain results are described in sufficient detail to understand and interpret results
  13. Provides a link/reference to additional materials (e.g., code and data stored in box or GitLab)
  14. Reflects on successes / what is fairly complete in the project so far
  15. Reflects on challenges / what is incomplete in the project so far
  16. Discusses concrete/specific action items to complete the final project

Module 5B: Visualization

  1. Prepare (soft due Th 10/14, hard due 10/18)
    1. Content below
    2. Sakai quizzes
  2. Group Worksheet (soft due F 10/15, hard due 10/18)
  3. Practice (due M 10/25)
  4. Perform (due M 11/8)

Content

5B.A Data Visualization and Design

  1. Why Visualize? (11 min.)
  2. Basic Plot Types (17 min.)
  3. Dos and Don’ts (10 min.)

5B.B Visualization in Python

  1. Intro to Python Visualization Landscape (7 min.)
  2. Seaborn Introduction (17 min.)
  3. Seaborn Examples (17 min.)

Optional Supplements

Module 5A: Databases & SQL

  1. Prepare (soft due Tu 10/12, hard due 10/18)
    1. Content below
    2. Sakai quizzes
  2. Group Worksheet (soft due W 10/13, hard due 10/18)
  3. Practice (due M 10/25)
  4. Perform (due M 11/8)

Content

5A.A – Relational Database (24 min.)

5A.B

  1. SQL Querying (21 min.)
  2. SQL with Python and Pandas (12 min.)

Optional Supplements