Author: Qianyu Yang

Module 04: Data Wrangling

  1. Prepare (due Mon 9/23)
    1. Content below
    2. Canvas quizzes
  2. Class engagement – See on the class forum
  3. Homework (due Sun 9/29) [LINK]
  4. Worked Example [LINK]

Content (Slides in the Box folder)

04.A – What is Wrangling

  1. Data sources, formats, and importing (26 min.)
  2. Common data cleaning problems (16 min.)
  3. Read Section 3.4 Handling Missing Data from Python Data Science Handbook

04.B – Wrangling Text

  1. Python string operations (16 min.)
  2. Introduction to regular expressions (18 min.)
  3. Read Section 3.10 Vectorized String Operations from Python Data Science Handbook

Optional Supplements

Module 10: Deep Learning

  1. Prepare (due Monday 11/27)
    1. Content below
    2. Canvas quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sunday 12/3) [Link]
  4. There are no worked examples

Content (Box)

10 Deep Learning

  1. Neural Networks and Applications (16 min.)
  2. Forward Propagation (10 min.)
  3. Gradient Descent (14 min.)
  4. Back Propagation (11 min.)
  5. Convolutional Neural Network (15 min.)
  6. Introducing Pytorch (23 min.)

Optional Supplements

Pytorch

Unlike most other libraries for this course, Pytorch is not included in the basic Anaconda installation. To use Pytorch, we suggest you choose one of two options.

  • Install Pytorch locally (for free). You can see the directions on the website: Select the stable build, your operating system, Conda (for Anaconda), Python, and CPU to see install directions for your particular setup. (CUDA is used to support hardware acceleration with NVIDIA graphics cards and is not necessary for this course).
  • Use Pytorch in a Jupyter notebook in the cloud (also for free). The easiest way to do this if you have a Google account is with a Google colab notebook; Pytorch will already be available to you in this cloud environment.

You can find the official Pytorch documentation here. Of particular note are the Pytorch tutorials, including Pytorch recipes which serve as small examples of common tasks.

Book

The deep learning book is available free online and is authored by some of the leading experts in machine learning with deep artificial neural networks. It is very detailed and in-depth and is purely for those who are interested in learning more about deep learning theory now or in the future; you do not need to read the book for this course.

Module 08: Prediction & Supervised Machine Learning

  1. Prepare (due Mon 10/30)
    1. Content below
    2. Canvas quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 11/5) [Link]
  4. Worked Examples [Link]

Content (Slides in Box)

08. A Predictive Modelling and Regression

  1. Ordinary Linear Regression and Intro Scikit-Learn (21 min.)
  2. Nonlinear Regression and Scikit-Learn Preprocessing (13 min.)
  3. Binary Classification with Logistic Regression (22 min.)

Note: sklearn.metrics.plot_confusion_matrix introduced in p.28-29 in the slides/video is deprecated; use sklearn.metrics.ConfusionMatrixDisplay instead. To see the updated slides, switch to the “slides” panel when viewing the 09.A.III video in Panopto.

08.B Machine Learning and Classification

  1. Naïve Bayes and Text Classification (20 min.) – The video has a typo on slide 10, see the pdf of the slides in Box for the fix.
  2. K-Nearest Neighbors and Training/Testing (31 min.)

Optional Supplements

Chapter 5 Machine Learning from the Python Data Science Handbook provides a very nice treatment of many of the topics from the above videos and more. If you are new to machine learning, we highly recommend that you read sections 5.1 “What is Machine Learning” through 5.4 “Feature Engineering” after completing the videos. After that, you can optionally read any of the In-Depth sections about specific algorithms for prediction.

In addition, the scikit-learn documentation itself provides several resources for working with the library:

Module 06: Combining Data

  1. Prepare (due Mon 10/9)
    1. Content below
    2. Canvas quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 10/15) [Link]
  4. Worked Example [Link]

Content (Slides in the Box Folder)

06.A – Summarizing Data

  1. Read Section 3.8 Aggregating and Grouping from Python Data Science Handbook.
  2. Read Section 3.9 Pivot Tables from Python Data Science Handbook.

06.B – Merging Data

  1. Read Section 3.6 Concat and Append from Python Data Science Handbook. Please note that the join_axes optional parameter mentioned in this section has been deprecated from the Pandas library, you can skip over the details on this parameter.
  2. Read Section 3.7 Merge and Join from Python Data Science Handbook
  3. Record Linkage (8 min.)
  4. Fuzzy Matching (21 min.)

Optional Supplements

Module 02: Numpy & Pandas

  1. Prepare (due Mon 9/4)
    1. Content below
    2. Canvas quiz
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 9/10) [Link]
  4. Worked Example [Link]

Content (Slides in the Box folder)

2.A – Numpy (1 hour)

  1. Why Numpy (8 min.)
  2. Numpy Array Basics (15 min.)
  3. Numpy Universal Functions (20 min.)
  4. Numpy Axis (14 min.)

2.B – Pandas (45 min.)

  1. Why Pandas (7 min.)
  2. Pandas Series (19 min.)
  3. Pandas Dataframe (21 min.)

Optional Supplements

Module 01: Python, Central tendency, & Jupyter Notebook

  1. Prepare (due Mon 8/28)
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 9/3, 11:59 PM, late due Su 9/10, no late tokens required) [Link]

Content (Slides in the Box folder)

1.A – Welcome to the class! (in-class on 8/30 or see recording)

1.B – Python3 (14 min.)

  1. Python vs. Java (3 min.)
  2. Data Types (2 min.)
  3. Iteration, Functions, Classes (7 min.) – slide 19 has a typo, the pdf has been fixed
  4. sorted() function documentation (2 min.)

1.C – Python for Data Science

  1. Anaconda and Jupyter (10 min.)
  2. Jupyter Notebook Demo (11 min.)

1.D – Central Tendency

  1. If you need a refresher/overview on the definitions of central tendency: mean, median, and mode

Optional Supplements

Powered by WordPress & Theme by Anders Norén