Data is the new currency. In every walk of life, people leave digital traces, which are stored and analyzed at both individual and population levels, by businesses for improving products and services, by governments for policy-making and national security, and by scientists for advancing the frontiers of human knowledge.

This course serves as an introduction to various aspects of working with data–acquisition, integration, querying, analysis, and visualization–and data of different types–from unstructured text to structured databases. Through lectures and hands-on labs, the course covers both fundamental concepts and computational tools for working with data and applies them to real datasets in a capstone team project.

This course is open to students from both inside and outside computer science. Dealing with data requires more than just computer programming: What do we know about the processes underlying the data? What are the interesting questions to ask about data? What practical impacts can arise from the data? What constitutes ethical uses? Therefore, we also welcome students with analytical backgrounds (e.g., statistics, math) or knowledge in fields that would benefit from data analysis (e.g., social and life science, public policy).


This course requires basic knowledge of programming (the equivalent of CompSci 101) and statistics. Additionally, each student should have taken at least one of the following (or their equivalent):

  • a 200-level (or above) computer science course;
  • a 100-level (or above) statistics course that involved some programming;
  • a 200-level (or above) math course that involved some programming.

If the prerequisites are not met, students must obtain the consent of the instructor to enroll.

If you have no programming background and want an introductory programming experience that focuses on data science, you should consider taking CompSci116 Foundations of Data Science.