Technology Install Directions
This class uses Anaconda’s Individual Edition. It’s a free open source distribution containing Python, Jupyter Notebook, and (nearly) everything for data science in Python. Go to Anaconda’s Individual Edition and download the data science toolkit for your operating system with Python version >= 3.7. If you have trouble installing, check the anaconda documentation. When you are done, we recommend trying to open a Jupyter Notebook (enter “jupyter notebook” at a command line terminal, run Jupyter Notebook like a regular Windows program, or run the Anaconda navigator program and select Jupyter Notebook) and begin familiarizing yourself with the Jupyter Notebook documentation.
We will be using Gradescope to submit labs and projects. If you are unfamiliar with Gradescope or aren’t sure how to submit your assignment, they created a Gradescope help document for you.
Jupyter Notebook Container
If something happens to your computer or you cannot install Anaconda on it, we’ve reserved containers for you through OIT. Go to the container manager and look for “JupyterLab with Pytorch for Data Science and Machine Learning”. Click on the button to reserve your instance of the notebook. Once your instance is reserved you can click on “Pytorch” among your reserved containers, start the server, and upload any necessary files.
All times are listed for US Eastern Time (i.e., local time at Duke). Office hours will begin starting Sunday, 8/29.
- Prof. Stephens-Martinez
- Monday 12 – 1 pm ET (virtual, see Sakai for link)
- Thursday 11:30 am – 12:30 pm ET (LSRC D224)
TA Office Hours
- 7 – 11 pm ET: UTAs (Zoom)
- 7 – 11 pm ET: UTAs (Zoom)
- 7 – 9 pm ET: UTAs (Zoom)
- 9 – 11 pm ET: UTAs (Zoom)
- 12 – 2 pm ET: Chang Xu (In-person: outside LSRC D344)
There are a few steps to get office hours support:
- If attending remotely, go to the Zoom link available in Sakai.
- Log into My Digital Hand Beta and navigate to the Get Help Page.
- Click the Get Help Now button and answer the prompted questions.
- You will be added to a waitlist. Wait in-person or Zoom. If in person, a TA will call for you. If in Zoom, a TA will connect with you via a Zoom breakout room when it is your turn on the waitlist.
- After you have been helped, return to My Digital Hand Beta. You will again be prompted to fill out some questions regarding the help you received.
My Digital Hand Beta Student Instructions
To sign up go to My Digital Hand Beta and use entry code SHUPESL. The most important step is to make sure you sign-up with your Duke email in the form email@example.com.
We will use Ed Discussion for online discussion. You access it from Sakai. You can post questions anonymously (or anonymous to classmates), as well as message the instructors. Please use it for technical questions first instead of email. In particular, we encourage you to use it so that (a) you can get a faster response (multiple instructors or students can reply), (b) your questions don’t get lost in anyone’s email, and (c) other students can benefit from your questions or comments.
Python for Data Science
If you are new to programming in Python, there are a lot of good tutorials available. The official documentation has one: https://docs.python.org/3/tutorial/index.html and google also hosts a good tutorial with videos: https://developers.google.com/edu/python/. If you want a guide that is specific to transitioning to Python from Java, try http://python4java.necaiseweb.org/Main/TableOfContents. Note that if you are new to programming altogether, you do not meet the pre-requisites for the class and should consider CS 101 or CS 116 instead; we are assuming that you have the background to pick up basic syntax and functionality of Python on your own.
If you are new to scientific programming with Python, you may find this NumPy tutorial helpful, along with the NumPy documentation. The Python Data Science Handbook is also a very useful reference the use of Python for data science, including helpful information on commonly used libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn. For a gentler introduction, try this online data8 book developed for U.C. Berkeley’s Foundations of Data Science course and used in CS 116 at Duke.
To get all of the data science libraries you need together with a Python distribution on your local device, look at the Anaconda distribution, available for free. The Anaconda distribution of Python contains everything that you need to be successful in data science with Python, including all Python resources you should need for this course. It includes Python 3 itself, all of crucial libraries for data science (NumPy, Pandas, Matplotlib, scikit-learn, etc), and development environments (notably the Spyder scientific computing IDE and Jupypter notebooks).
The innovation Co-Lab hosts a variety or trainings, projects, and programming that might be interesting to an aspiring data scientist. The Co-Lab also hosts regular office hours (and you can make an appointment) on a variety of technical subjects.
A unix/linux terminal (or bash shell) is the basic non-graphical interface with which all computer scientists (and likely all data scientists) need to be familiar. We may occasionally need to use terminals in the class to install packages, connect remotely, or execute code. If you have never worked with a terminal before, see a brief introduction to Shell Basics.
Academic Resource Center
Want expert consultation about study habits, learning, time management, and more? Check out the Academic Resource Center (ARC) at Duke.
Counseling and Psychological Services
Your thriving is about more than this class. If you need to talk to someone, consider Duke Counseling and Psychological Services (CAPS).