Technology Install Directions
This class uses Anaconda’s Individual Edition. It’s a free open source distribution containing Python, Jupyter Notebook, and (nearly) everything for data science in Python. Go to Anaconda’s Individual Edition and download the data science toolkit for your operating system with Python version >= 3.7. If you have trouble installing, check the anaconda documentation. When you are done, we recommend trying to open a Jupyter Notebook (enter “jupyter notebook” at a command line terminal, run Jupyter Notebook like a regular Windows program, or run the Anaconda navigator program and select Jupyter Notebook) and begin familiarizing yourself with the Jupyter Notebook documentation.
Note that there will not be office hours when classes are not in session (including March 9-10 and April 12). All times are listed for US Eastern Time (i.e., local time at Duke). Office hours will begin starting Sunday, 1/24.
There are a few steps to get office hours support:
- First, click on the relevant office hours Zoom link below.
- Next, please log into My Digital Hand Beta and navigate to the Get Help Page.
- Then, click the Get Help Now button and answer the prompted questions.
- From there, you will be added to a waitlist. Simply return and wait on Zoom. One of the TAs will connect with you via a Zoom breakout room when it is your turn on the waitlist.
- After you have been helped, exit Zoom and return to My Digital Hand Beta. You will again be prompted to fill out some questions regarding the help you received.
- 7-11 pm – UTAs (Zoom Link)
- 11:30-1:30 pm – Nirav Patel (Zoom Link)
My Digital Hand Beta Student Instructions
The most important step is to make sure you sign-up with your Duke email in the form email@example.com. There are two ways to sign-up for MDH beta if you still haven’t done so.
- Navigate to https://beta.mydigitalhand.org/ and use entry code SREX4KM
- Scan this QR code and sign-up directly
We will use piazza for course communications, announcements, and online discussion. You access piazza from Sakai. You can post questions anonymously (or anonymous to classmates), as well as message the instructors. Please use piazza for technical questions first instead of email. In particular, we encourage you to use piazza so that (a) you can get a faster response (multiple instructors or students can reply), (b) your questions don’t get lost in anyone’s email, and (c) other students can benefit from your questions or comments. Please read the pinned usage guidelines before posting.
Python for Data Science
If you are new to programming in Python, there are a lot of good tutorials available. The official documentation has one: https://docs.python.org/3/tutorial/index.html and google also hosts a good tutorial with videos: https://developers.google.com/edu/python/. If you want a guide that is specific to transitioning to Python from Java, try http://python4java.necaiseweb.org/Main/TableOfContents. Note that if you are new to programming altogether, you do not meet the pre-requisites for the class and should consider CS 101 or CS 116 instead; we are assuming that you have the background to pick up basic syntax and functionality of Python on your own.
If you are new to scientific programming with Python, you may find this NumPy tutorial helpful: http://cs231n.github.io/python-numpy-tutorial/, along with the NumPy documentation: https://docs.scipy.org/doc/. The following is also a very useful reference: https://jakevdp.github.io/PythonDataScienceHandbook/ the use of Python for data science, including helpful information on commonly used libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn. For a gentler introduction, try this online data8 book developed for U.C. Berkeley’s Foundations of Data Science course and used in CS 116 at Duke: https://dukecs.github.io/textbook/chapters/intro.html.
To get all of the data science libraries you need together with a Python distribution on your local device, look at the Anaconda distribution, available for free: https://www.anaconda.com/distribution/. The Anaconda distribution of Python contains everything that you need to be successful in data science with Python, including all Python resources you should need for this course. It includes Python 3 itself, all of crucial libraries for data science (NumPy, Pandas, Matplotlib, scikit-learn, etc), and development environments (notably the Spyder scientific computing IDE and Jupypter notebooks).
The innovation Co-Lab hosts a variety or trainings, projects, and programming that might be interesting to an aspiring data scientist: https://colab.duke.edu. The Co-Lab also hosts regular office hours (and you can make an appointment) on a variety of technical subjects: https://colab.duke.edu/resources.
A unix/linux terminal (or bash shell) is the basic non-graphical interface with which all computer scientists (and likely all data scientists) need to be familiar. We may occasionally need to use terminals in the class to install packages, connect remotely, or execute code. If you have never worked with a terminal before, see a brief introduction to Shell Basics.
If you are unfamiliar with Gradescope or aren’t sure how to submit your assignment, see: https://www.gradescope.com/help#help-center-section-student-workflow.
Academic Resource Center
Want expert consultation about study habits, learning, time management, and more? Check out the Academic Resource Center (ARC) at Duke: https://arc.duke.edu.
Counseling and Psychological Services
Your thriving is about more than this class. If you need to talk to someone, consider Duke Counseling and Psychological Services (CAPS): https://studentaffairs.duke.edu/caps.