Course Information

  • Class Meeting: Tuesdays & Thursdays, 12:00 – 1:15 PM ET, Gross Hall 107
  • InstructorKristin Stephens-Martinez
  • Teaching Associate: Alex Chao
  • Graduate TA: Yunzhou (David) Liu, Shao-Heng Ko
  • Head UTAs: Leah Okamura, Neel Gajjar
  • Undergraduate TAs: Arinton Davis, Ashley Chen, Jabari Kwesi, Julia Mitchell,  Maddie Demming, Shari Tian, Trailokaya (Raj) Bajgain
  • Course Box Folder: All material will be made available in the course box folder
  • Course Forum: Ed Discussion (accessible via Sakai)
  • Course Grading: Gradescope (accessible via Sakai)
  • Zoom link/Course Videos/Streaming:
    • For lecture videos before 10/4 (modules 1-5), they are accessible via Sakai -> Zoom meetings -> recordings.
    • Lecture videos since 10/4 (module 6 and on) are Duke-captured and accessible via Panopto.

Course Description

Data is the new currency. In every walk of life, people leave digital traces, which are stored and analyzed at both individual and population levels, by businesses for improving products and services, by governments for policy-making and national security, and by scientists for advancing the frontiers of human knowledge.

This course serves as an introduction to various aspects of working with data–acquisition, integration, querying, analysis, and visualization–and data of different types–from unstructured text to structured databases. Through lectures and hands-on labs, the course covers both fundamental concepts and computational tools for working with data and applies them to real datasets in a capstone team project.

This course is open to students from both inside and outside computer science. Dealing with data requires more than just computer programming: What do we know about the processes underlying the data? What are the interesting questions to ask about data? What practical impacts can arise from the data? What constitutes ethical uses? Therefore, we also welcome students with analytical backgrounds (e.g., statistics, math) or knowledge in fields that would benefit from data analysis (e.g., social and life science, public policy).


This course requires basic knowledge of programming (the equivalent of CompSci 101) and statistics. Additionally, each student should have taken at least one of the following (or their equivalent):

  • a 200-level (or above) computer science course;
  • a 100-level (or above) statistics course;
  • a 200-level (or above) math course.

If the prerequisites are not met, students must obtain the consent of the instructor to enroll.

If you have no programming background and want an introductory programming experience that focuses on data science, you should consider taking CompSci116 Foundations of Data Science.