Skip to content

Setup

This class utilizes the Python programming language and several important scientific computing libraries including NumPy, Scikit-Learn, and PyTorch. The basic development and execution environment will be Jupyter, a scientific computing platform supporting Python and other languages (but we will use Python). All of these software tools are free, and their usage will be demonstrated in-class.

You have two options for developing and executing code: You can use Jupyterlab in-the-browser, hosted by Duke OIT or CS department servers, or you can install the relevant software on your own personal device. Directions for each are included below. The choice is up to you, but if you choose to install on your own personal device then you are responsible for managing your own environment.

Using Jupyterlab in-the-browser with containers [no install]

This is the option with no installation, to use Jupyterlab in-the-browser hosted on container maintained by Duke OIT. To utilize this resource:

  1. Open a web browser and navigate to https://cmgr.oit.duke.edu/containers. Login with your NetID.
  2. The first time you access the container you will need to request a reservation under the “Reserve a container” tab. You should see several options under “Reservations available.” Select the “Pytorch JupyterLab with Pytorch for Data Science and Machine Learning” option.
  3. Now, whenever you return to the above url, you can select your container under reservations and select the blue “Login” button. This should launch a Jupyterlab view in your browser. If you open a Jupyter notebook (.ipynb file) you should be able to run code directly in the browser.

Later in the course we will discuss GPU utilization for efficient training and inference with large neural networks. We will provide directions at that point for how to access a similar interface to the above but hosted on CS department computers with GPU access enabled.

Installing on your personal device

You are not required to install anything on your personal device, but if you are interested in machine learning outside of the course you might be interested in doing so for the added flexibility and customizability.

There are multiple package managers for Python. For scientific computing and machine learning, I recommend Anaconda’s Individual Edition. It’s a free distribution containing Python, Jupyter, and (nearly) everything for data science in Python, and you can use it to install and upgrade anything else. Choose the appropriate download for your device. If you have trouble installing, check the anaconda documentation. If everything has gone correctly, you can open Jupyterlab with the terminal command “jupyter lab” or by selecting the program from the Anaconda navigator.

Note that you are responsible for managing your own device. You can do all of your coursework on the containers from any computer with a web browser, so problems with your personal device or installation will not be considered an excuse for late submission of work.

Next Steps: Introduction to Jupyter and Python

Once you can open Jupyterlab, either on a container or on your personal device, create your first Jupyter notebook file (a .ipynb file, select for Python if given a choice). Then review the Jupyter Notebook documentation to familiarize yourself with the interface. We will also demonstrate in class.

Finally, if you are new to the Python programming language, begin familiarizing yourself. You might utilize the official long-form tutorial, or the quick cheatsheet if you prefer to learn from short examples and experimentation. Either way, you do not need to master all aspects of Python all at once. Begin by familiarizing yourself with how to translate concepts you are familiar with from your primary programming language into Python (loops, conditionals, functions, classes, strings, lists, etc.).