Skip to content

Setup

This class utilizes the Python programming language and several important scientific computing libraries including NumPy, Scikit-Learn, and PyTorch. The basic development and execution environment will be Jupyter, a scientific computing platform supporting Python. All of these software tools are free.

Computing Environments

Some of the work in this class will be computationally demanding, necessitating GPU (Graphics Processing Unit) compute and at least 10 GB of working memory (we will work with models up to the billions of parameters size, but not tens of billions). You have three options for developing and executing code:

Option 1: Google Colab

Google provides a free browser-based Jupyter environment with CPU and GPU compute support at colab.research.google.com. This is a no-install option but it requires a Google account to login and does have usage limits for free accounts (though you should be able to complete your coursework without exceeding those limits).

Option 2: Using Jupyterlab with Duke CS Servers

The Computer Science Department hosts OnDemand access to a browser-baesd Jupyterlab running on department clusters with access to CPU and GPU compute support at ondemand.cs.duke.edu (login with your NetID, select Interactive Apps à JupyterLab). If you are a declared computer science major then you should already have login access. If you are not a declared computer science major, an account will be automatically created for you at the start of the second week of classes.

Be a good steward of compute resources:

  • If you do not need GPU compute, you should select the compsci partition and 0 GPUs. If you do need GPU compute then you should select the compsci-gpu partition and request 1 GPU.
  • Use the default values of 10G RAM and 1 core
  • Select a number of hours that you expect to be working in this session – you can renew the session later if you need more time. Do not request long sessions (e.g., 24 hours) when you only expect to be working for a couple hours.

Option 3: Installing on your personal device

You are not required to install anything on your personal device, but if you are interested in machine learning outside of the course you might be interested in doing so for the added flexibility and customizability.

There are multiple package managers for Python. For scientific computing and machine learning, I recommend Anaconda’s Individual Edition. It’s a free distribution containing Python, Jupyter, and (nearly) everything for data science in Python, and you can use it to install and upgrade anything else. Choose the appropriate download for your device. If you have trouble installing, check the anaconda documentation. If everything has gone correctly, you can open Jupyterlab with the terminal command “jupyter lab” or by selecting the program from the Anaconda navigator.

Note that you are responsible for managing your own device. You can do all of your coursework on the containers from any computer with the first two options, so problems with your personal device or installation will not be considered an excuse for late submission of work.

Next Steps: Introduction to Jupyter and Python

Once you can open Jupyterlab, either on a container or on your personal device, create your first Jupyter notebook file (a .ipynb file, select for Python if given a choice). Then review the Jupyter Notebook documentation to familiarize yourself with the interface.

If you haven’t already, now would also be a good time to review the recommended background material, especially on Python and NumPy. If you are new to Python and would like more learning resources, you might utilize the official long-form tutorial, or the quick cheatsheet if you prefer to learn from short examples and experimentation.

Keep in mind that you do not need to master all aspects of Python all at once. Begin by familiarizing yourself with how to translate concepts you are familiar with from your primary programming language into Python (loops, conditionals, functions, classes, strings, lists, etc.).