Author Archives: Shao-Heng Ko

Module 06: Combining Data

  1. Prepare (due Su 10/2)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due F 10/7) [Notebook]
  4. Worked Example [Notebook]

Content (Slides in the Box folder)

6.A – Summarizing Data

  1. Read Section 3.8 Aggregating and Grouping from Python Data Science Handbook.
  2. Read Section 3.9 Pivot Tables from Python Data Science Handbook.

6.B – Merging Data

  1. Record Linkage (8 min.)
  2. Read Section 3.6 Concat and Append from Python Data Science Handbook. Please note that the join_axes optional parameter mentioned in this section has been deprecated from the Pandas library, you can skip over the details on this parameter.
  3. Read Section 3.7 Merge and Join from Python Data Science Handbook
  4. Fuzzy Matching (21 min.)

Optional Supplements

Module 04: Data Wrangling

  1. Prepare (due Su 9/18)
    1. Content below
    2. Sakai quizzes
  2. Peer Instructions – See on the class forum
  3. Homework (due F 9/23) [Notebook]
  4. Worked Examples [Notebook]

Content (Slides in the Box folder)

4.A – What is Wrangling

  1. Data sources, formats, and importing (26 min.)
  2. Common data cleaning problems (16 min.)
  3. Read Section 3.4 Handling Missing Data from Python Data Science Handbook

4.B – Wrangling Text

  1. Python string operations (16 min.)
  2. Introduction to regular expressions (18 min.)
  3. Read Section 3.10 Vectorized String Operations from Python Data Science Handbook

Optional Supplements

Module 02: Numpy & Pandas

  1. Prepare (due Su 9/4)
    1. Content below
    2. Sakai quiz
  2. Peer Instructions – See on the class forum
  3. Homework (due F 9/9) [Notebook]
  4. Worked Example [Notebook]

Content (Slides in the Box folder)

2.A – Numpy (1 hour)

  1. Why Numpy (8 min.)
  2. Numpy Array Basics (15 min.)
  3. Numpy Universal Functions (20 min.)
  4. Numpy Axis (14 min.)

2.B – Pandas (45 min.)

  1. Why Pandas (7 min.)
  2. Pandas Series (19 min.)
  3. Pandas Dataframe (21 min.)

Optional Supplements

Module 01: What is Data Science, Anaconda, Python, & Jupyter

  1. Prepare (due W 8/31)
    1. Content below
    2. Sakai quiz
    3. Install Anaconda (see the Resources page for more instructions)
  2. Peer Instructions – See on the class forum
  3. Homework (due Sep 2nd, 2022, 11:59 PM) [Notebook]

Content (Slides in the Box folder)

1.A – What is Data Science? (in-class on 8/30 or see recording)

1.B – Python3 (14 min.)

  1. Python vs. Java (3 min.)
  2. Data Types (2 min.)
  3. Iteration, Functions, Classes (7 min.) – slide 19 as a typo, the pdf has been fixed
  4. sorted() function documentation (2 min.)

1.C – Python for Data Science

  1. Anaconda and Jupyter (10 min.)
  2. Jupyter Notebook Demo (11 min.)

Optional Supplements