Virtualization is sometimes dismissed in the high-performance computing field since there is an assumption that virtualization will hurt performance. In some cases, this concern would seem to be warranted — virtualization is yet another layer between a scientific application and the hardware.
Researchers at Northwestern University, Sandia National Labs, and the University of New Mexico recently ran a massive test of parallel computing on the Red Storm supercomputer. They ran communication intensive, fine-grain parallel benchmarks on 4096 nodes — the largest virtualized parallel simulations run to date.
Their results suggest a 5% or less degradation in performance caused by the virtualization layer. The benefits of virtualization (checkpointing, migration, load-balancing, insulating user-code from the specifics of the hardware) may easily make up for that small a drop in performance.
ScaleMP is another option for those researchers who need very large memory configurations. Our current blade-based servers can accept up to 12 memory sticks for a total of 96GB. ScaleMP allows you to combine multiple machines into a single system image — instead of 16 separate machines, you have 1 machine with 16 times as many CPUs and 16 times as much memory. One drawback is that ScaleMP requires an Infiniband network — a high-performance, low-latency interconnect — which is used to quickly access “remote” memory.
The current version of ScaleMP allows for up to 128 CPU-cores and 4TB of memory! The “system” appears to the user as one very large server, there is nothing new to learn and no changes need to be made to your applications.
We’ve seen a growing trend towards higher memory capacity in the machines that we order for the DSCR. Unfortunately, we’re also starting to see a trend towards higher prices for the raw memory chips. For most servers — given their compact size — you end up using the densest (most expensive) memory chips and so the price can really climb.
There are now at least two possible options to consider for very large memory systems — RNA Networks and ScaleMP.
RNA Networks is a kind of network-based virtual memory. You allocate pools of memory on several machines, and when one of those machines needs extra memory, it will reach out over the network and access the pooled memory on another server. Much like the way virtual memory systems (on a single machine) will store older, less frequently used memory pages to disk, this method pushes the page out to the network — with 10Gbps Ethernet or Infiniband, RNA Networks claims 100x faster results. One installation uses 300 machines and provides an 11TB shared memory pool.