The Duke Compute Cluster consists of machines that the University has provided for community use and that researchers have purchased to conduct their research. The CHSI has high-priority access to 31 servers on the Duke Compute Cluster, comprising 8 virtual nodes configured for GPU computation (RTX 2080 Ti GPU, 1 CPU core [2 threads], 16 GB RAM) and 23 non-GPU virtual nodes for CPU and/or memory intensive tasks (42 CPU cores [84 threads], 700 GB RAM). All nodes utilize Intel Xeon Gold 6252 CPUs @ 2.10GHz.
The operating system and software installation and configuration is standard across all nodes (barring license restrictions), with Red Hat Enterprise Linux 7 the current operating system. SLURM is the scheduler for the entire system. Software is managed by “module” and, increasingly, through the use of Singularity containers, which incorporate an entire software environment and greatly increase reproducibility.
Below are a few useful resources for getting started using the Duke Compute Cluster (DCC):
DCC User Guide
Register for an upcoming DCC training session
Dr. Granek’s guide to running RStudio / Jupyter using Singularity
Connecting to the DCC
To connect to the cluster, open a Secure Shell (ssh) session from a terminal using your Duke NetID (replace “NetID” below with your NetID).
After running the command you will be prompted for your password. If you are on campus (University or DUHS networks) or connected to the Duke network via the Duke VPN then you will be connected to a login node. If you are off campus and not connected via the Duke VPN, logging in requires 2-factor authentication and you will also be prompted to enter a 2nd password for your DUO passcode.
Running Jobs on CHSI Nodes
To run commands on compute intensive nodes, include the command line options
-A chsi -p chsi to specify the “chsi” account (
-A chsi) and the “chsi” partition (
-p chsi). For example:
srun -A chsi -p chsi --pty bash -i
To run commands on GPU nodes include the command line options
-A chsi -p chsi-gpu --gres=gpu:1 --mem=15866 -c 2, where
-A chsispecifies “chsi” account
-p chsi-gpuspecifies “chsi-gpu” partition (the nodes that have GPUs)
–gres=gpu:1to actually request a GPU
–mem=15866 -c 2to request all available memory and CPUs on the node, since only one jobs can run on a GPU node at a time
srun -A chsi -p chsi-gpu --gres=gpu:1 --mem=15866 -c 2 --pty bash -i
Storing data on the cluster
There are several different storage options on DCC. Most are discussed at https://rc.duke.edu/dcc/cluster-storage/, but there are a few CHSI specific details below. Please read this and the cluster storage information carefully and be mindful of how you use storage on the cluster.
Shared Scratch Space
The Shared Scratch Space is mounted at
/work. To use the shared scratch space, make a subdirectory of
/work with your NetID and store your files in that sub-directory. For example, if your NetID was “jdoe28” you would use the command
CHSI has 1 TB of storage at
/hpc/group/chsi. While 1 TB seems like a lot, it fills up fast, so please be mindful of how you use this space. Group storage is NOT appropriate for long term storage of large datasets. To use the group storage, make a subdirectory of
/hpc/group/chsi with your NetID and store your files in that sub-directory. For example, if your NetID was “jdoe28” you would use the command
Each of the nodes in the chsi partition has 8TB SSD mounted at
/scratch. This is in addition to (and different from) the Shared Scratch Space that is at
/scratch is local to the node, it is potentially faster than the DCC shared storage (Group Storage, Home Directory, and Shared Scratch). However, because
/scratch is local to a node, anything stored there is only available on that node. In other words, if you run a job on dcc-chsi-01 and save output to
/scratch, it will not be accessible from dcc-chsi-02. As with Shared Scratch,
/scratch is not backed up and files are automatically deleted after 75 days.
Currently the best archival storage option for CHSI users is Duke Data Service (DDS). It currently offers free, unlimited storage. It is not mounted on DCC, but there is a command line tool for moving data to and from DDS. DDS is also a convenient way for moving data around campus.
None of other options discussed above are appropriate for archival storage. Local and Shared Scratch are for short term storage during computation. Our group storage at
/hpc/group/chsi/ is limited to 1 TB, which fills up quickly. It is possible to purchase archival storage on Data Commons, but we do not currently have plans to do this.
How Much Space Am I Using
du command tells you how much space is being used by a directory and its sub-directories. The following command will show the usage of jdoe28’s sub-directory on the group storage and each of its sub-directories:
du --all --human-readable --max-depth 1 /hpc/group/chsi/jdoe28
The following will tell you how much space is used and available on the group storage
df -h | egrep 'chsi|Filesystem'