Urgent Update: Research Computing Outages
From Jan 4th to Jan 8th, most research computing resources will be unavailable for scheduled maintenance. During that timeframe, we will be completing routine maintenance, such as firmware and software upgrades, and updating services to improve the DCC for all users. Most notably, the Operating System for all DCC nodes will be updated from RHEL 7 to CentOS 8.
Testing compatibility of existing software with the new OS is recommended. Researchers may submit sample jobs to test software compatibility with the new OS by logging in from campus or on VPN to: dcc-slogin-dev-01.oit.duke.edu.
All DCC file systems and software are available on the development cluster using the same paths, so once logged in you should be able to submit jobs as usual. But be aware compute resources on the development cluster are limited, so please size your sample jobs accordingly (no large cpu or memory jobs please) and jobs will be time limited.
If you are a DCC group point of contact, please make sure that at least one member of your lab completes testing and provides feedback via this form.
If you have any problems with testing or questions, please contact: firstname.lastname@example.org
- Duke Compute Cluster (DCC)
- All research virtual machines (VMs), including Research Toolkits RAPID virtual machines
- Research computing resources (cluster and individual virtual machines) in the PRDN and Protected Network
- Globus endpoints
- PACE VMs (OIT provided CPU Resources)
- Duke Data Commons storage services
What’s going to be done during the outage?
- Storage (fe13) and server hardware firmware patching
- Installation of additional storage to keep up with demand
- For the DCC only:
- SLURM software to the current stable version (likely 20.11)
- Operating system to centOS 8
- Enhanced node security
- Automation of GPU memory clean up tasks
A more detailed schedule of impacts will be posted in mid December. Questions about the outage and the changes should be emailed to email@example.com.
As a reminder: on 12/1: symlinks to /dscrhome and /hpchome will be retired
About Enhanced Node Security
After the January maintenance, you will only be able to access a compute node after reserving it through SLURM and multi factor authentication will be required to access the DCC both on and off campus. As an alternative to multi-factor authentication, users may utilize ssh public key authentication. To setup and enable ssh public key authentication update your “SSH Public keys” under “Advanced User Options” at: https://idms-web.oit.duke.edu/portal