1. Prepare (due Mon 10/30)
    1. Content below
    2. Canvas quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due Sun 11/5) [Link]
  4. Worked Examples [Link]

Content (Slides in Box)

08. A Predictive Modelling and Regression

  1. Ordinary Linear Regression and Intro Scikit-Learn (21 min.)
  2. Nonlinear Regression and Scikit-Learn Preprocessing (13 min.)
  3. Binary Classification with Logistic Regression (22 min.)

Note: sklearn.metrics.plot_confusion_matrix introduced in p.28-29 in the slides/video is deprecated; use sklearn.metrics.ConfusionMatrixDisplay instead. To see the updated slides, switch to the “slides” panel when viewing the 09.A.III video in Panopto.

08.B Machine Learning and Classification

  1. Naïve Bayes and Text Classification (20 min.) – The video has a typo on slide 10, see the pdf of the slides in Box for the fix.
  2. K-Nearest Neighbors and Training/Testing (31 min.)

Optional Supplements

Chapter 5 Machine Learning from the Python Data Science Handbook provides a very nice treatment of many of the topics from the above videos and more. If you are new to machine learning, we highly recommend that you read sections 5.1 “What is Machine Learning” through 5.4 “Feature Engineering” after completing the videos. After that, you can optionally read any of the In-Depth sections about specific algorithms for prediction.

In addition, the scikit-learn documentation itself provides several resources for working with the library: