Computational Resources at Duke

Last updated 09 December 2019

There are a wide range of computing resources at Duke. The goal of this document is to point DMC members to the different resources that are available and explain the differences between them.

All of these resources require a basic competence with using the Unix command line.

Computing

Standalone Servers

gemscompute01 (decommissioned in May 2020)

DMC has a moderately sized server named gemscompute01. It is managed by the GCB IT group and hosted by DHTS. It has 16 CPUs (Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz), 94 GB of RAM.

gemscompute01 is setup with the GCB IT Lmod system, which provides access to a large collection of install bioinformatics software. See the section on [Lmod] for more details.

gemscompute01 is behind the DHTS firewall, so it is only accessible from the DHTS network or the DHTS VPN.

Requests for access to use gemscompute should be made to the point of contact, Josh Granek <joshua.granek@duke.edu>

RAPID Virtual Machines

Duke provides all regular-rank faculty with a RAPID Virtual Machine allocation through Research Toolkits. The total allotment is 4 CPUs, 40 GB RAM, and 200GB storage, which can be used allocated to a single virtual machine, or divide among two or more. Using the full allotment for a single virtual machine provides a moderate size server, which should be quite capable for many microbial bioinformatic tasks, including analysis of a MiSeq run of amplicon data.

You have complete control over your RAPID virtual machines, which also means that you are completely responsible for managing it and installing any software that you need.

Virtual Computing Manager Virtual Machines

Duke provides all affiliates with a virtual machine through the Virtual Computing Manager (VCM) system. These virtual machines are modest in size with 2 CPUS, 2GB RAM, and 47GB of total storage. Nonetheless, they are capable of doing a fair amount of bioinformatics work with microbial data. You have complete control over your VCM server, which also means that you are completely responsible for managing it and installing any software that you need.

Clockworks Virtual Machines

Duke OIT provides a wide range of virtual machines for a fee. This service is called Clockworks.

Clusters

General Information

Duke clusters are a powerful resource for computationally demanding analyses. Each of the clusters below consists of over one thousand CPUs, some machines with very large RAM capacities, and other features beneficial for unusually demanding computational taks. But this power comes with additional complexity: on all of the Duke clusters discussed below compute jobs must be submitted through the Slurm Workload Manager. In addition to mastering the Unix command line, a basic requirement for all resources discussed here, users of the clusters must also master Slurm, which has a steep learning curve. Beyond SLURM, harnessing the power of a cluster requires software specifically designed for this task, or the know-how to distribute jobs across multiple machines. For many tasks one of the [Individual Servers] will be more than sufficient, and therefore use of a clusters is not worth the additional complication of working with the cluster.

Duke Compute Cluster

The Duke Compute Cluster consists of hundreds of machines with over 10,000 CPUs, many with 512GB RAM or more. It also has several machines with GPUs. Duke Compute Cluster is freely available to members of the Duke community, but users who have purchased nodes for the cluster get priority access to those nodes.

HARDAC

HARDAC is a high-performance cluster specifically designed for the data-intensive nature of high-throughput sequence analysis, but is smaller than the Duke Compute Cluster. HARDAC is behind the DHTS firewall, so it is only accessible from the DHTS network or the DHTS VPN. HARDAC is available to all members of the Duke University community. Members of the Duke Center for Genomic and Computational Biology have free access to HARDAC, while other Duke community members can access for a very small fee. For information on HARDAC contact Duke Center for Genomic and Computational Biology IT staff.

Storage

Duke Data Service

Duke Data Service (DDS) offers free data archiving for Duke affiliates. Data is accessible through the DDS web interface and through the DukeDSClient command line tool. Projects stored on DDS can be shared with anyone else that has a Duke NetID, so DDS is a convenient way to move data around campus.

Duke Data Commons

Duke Data Commons offers inexpensive storage that can be directly accessed from the Duke Compute Cluster and several of the mounted on several of the Standalone Servers discussed above. Storage costs are $80/TB/year.

Software

Lmod

  • OIT VMs can load RNA-Seq VM that OMMICS core built
  • Find out if OIT VMs can load GCB IT lmods

There is a large selection of bioinformatics software available on gemscompute01 through the Lmod system that is managed by GCB IT group.

A list of currently available software modules can be found by running module avail on any of the machines that use the GCB IT lmod system

More Information

For more information and help with most of the above resources contact the Office of Information Technology (OIT) Service Desk and Duke Research Computing.

For information on HARDAC contact Duke Center for Genomic and Computational Biology IT staff.

Important Notes

Protected Data

There are options for storing and computing on protected data (including PHI), but one should assume that the resources described above do not allow use of protected data unless confirmed otherwise