Starting May 1st, Coursera will be offering a High-Performance Scientific Computing class from Randall J. LeVeque of the University of Washington:

From the course description:

Programming-oriented course on effectively using modern computers to solve scientific computing problems arising in the physical/engineering sciences and other fields. Provides an introduction to efficient serial and parallel computing using Fortran 90, OpenMP, MPI, and Python, and software development tools such as version control, Makefiles, and debugging.

About the Course:  Computation and simulation are increasingly important in all aspects of science and engineering. At the same time writing efficient computer programs to take full advantage of current computers is becoming increasingly difficult. Even laptops now have 4 or more processors, but using them all to solve a single problem faster often requires rethinking the algorithm to introduce parallelism, and then programming in a language that can express this parallelism.  Writing efficient programs also requires some knowledge of machine arithmetic, computer architecture, and memory hierarchies.


Filed Under (Optimization, Programming, Websites) by John Pormann, Ph.D. on 28-01-2013

There’s a great article on profiling techniques for HPC over at Admin Magazine.  The author, Jeff Layton, lays out the differences between profiling and tracing, and then provides links to a wealth of tools:  processor tracing tools, system profiling tools, system tracing tools, and even MPI profiling/tracing tools.

“Knowing your application is one of the keys to being able to improve it and, perhaps most importantly, being able to judge which architecture (or architectures) you should think about using. In essence, “knowing yourself” from an application perspective. This ability is very important in the current climate, where non-x86 processors are on the rise and where accelerators are also becoming more commonplace and diverse.”

We’ve posted another web-based video on the wiki/Training page:

“Intro to OpenMP” covers the basic use of OpenMP — a programming environment for parallel application development.  OpenMP is a set of compiler “directives” for C/C++ and Fortran, so rather than having to learn a whole new programming language, you can continue to code in a familiar language and just add directives (instructions/hints to the compiler) on where you think parallelism could be extracted.  A simple example:

#pragma omp parallel do
for(i=0;i<N;i++) {
    y[i] = alpha * x[i] + y[i];

The addition of that one “pragma” statement converts this single-CPU loop into a multi-CPU/parallel loop.  Take a look at the video and see how to add multi-CPU/multi-core parallelism to your existing applications.


Filed Under (DSCR, Optimization, Training) by John Pormann, Ph.D. on 22-08-2011

We’ve posted another web-based video on the wiki/Training page:

“Using the Intel Libraries” covers the Intel Math Kernel Library (MKL) and Intel Performance Primitives Library (IPP).  These libraries were licensed along with the Intel C/C++ and Fortran compilers, and can be used by anyone on the DSCR.  In some cases, you may just have to link against the library to get a performance boost out of your existing code (e.g. BLAS and LAPACK functions).

Take a look at the video and see if either of those libraries might have functions that you can use in your own research.  This can save you a great amount of time and effort in terms of code development — and as a bonus, many of the routines offer free parallelism (multi-core/single-machine).

Filed Under (Multicore, Optimization, Parallel Computing, Software Development, Training) by John Pormann, Ph.D. on 11-08-2011

We will be offering the following seminars and workshops on scalable computing techniques in Fall 2011:

We have intentionally scheduled just a few seminars this semester to leave time open for any “on-demand” training that you might request. Our Training wiki-page has a list of seminars that we currently teach, or have taught recently, and if any of them look applicable to your research lab, or to a small group of students, we’d be happy to make arrangement to teach it. Email us at scsc at duke edu if you’d like to set something up.

Filed Under (Optimization, Software Development, Training, Websites) by John Pormann, Ph.D. on 22-07-2011

From HPCwire — “Software engineering is still something that gets too little attention from the technical computing community, much to the detriment of the scientists and engineers writing the applications. Greg Wilson has been on a mission to remedy that, mainly through his efforts at Software Carpentry, where he is the project lead. HPCwire asked Wilson about the progress he’s seen over the last several years and what remains to be done.

Great interview with Greg over at:

Filed Under (Optimization, Programming) by John Pormann, Ph.D. on 15-06-2011

A new paper out of Google shows some interesting data for performance of C++, Java, Go, and Scala —

The raw data is in Figure 8 of the paper:  C++ was the fastest (23 sec run-time for their application); 64-bit Java was 5.8x slower (134 sec); 32-bit Java was 12.6x slower, unless you tweaked the garbage collection process; with some tweaking, they could get 32-bit Java to be only 3.7x slower than C++.

Since Java has built-in support for some parallel programming concepts (basic synchronization, threads), there is some reason to believe it could be “easier” to program multi-threaded Java apps — then you could use multi-core CPUs to regain the lost performance.  Possibly.

If you’re running Java code on the DSCR, try using the following to force it to run in 64-bit mode:

java -d64 ...

all the other standard Java command-line arguments should work.  See if your program runs faster (and let us know!)

Filed Under (Optimization, Programming, Training) by John Pormann, Ph.D. on 11-01-2011

We have a pair of videos up now explaining how to use the Intel compiler suite that is installed on the new Centos-5 machines in the DSCR.  Part 1 covers basic optimizations which require no changes to your code (just command-line options to the compiler), and Part 2 covers optimizations which require minimal changes to your code — no major hacking required, just “pragma” lines placed in front of loops.

We’ve also started a page ( to show some of the problem domains that are covered by the Intel MKL and IPP libraries — MKL primarily covers linear algebra operations, IPP covers signal/audio processing, image/video processing, realistic rendering, and cryptography.  Many of the MKL and IPP routines are multi-core (parallel) ready.

Filed Under (Multicore, Optimization, Parallel Computing, Programming, Training) by John Pormann, Ph.D. on 30-12-2010

Victor Eijkhout from the Texas Advanced Computing Center (TACC) (at University of Texas at Austin), has released a new HPC book at

Also, James Leigh, John McHugh and Sanjay Goil have put out an eBook on HPC development:

The first book is more appropriate for more programming-savvy users — it delves into CPU registers, cache hierarchy, memory bus architectures, etc.  While all of those topics are essential for fully optimizing your application, they may require significant rewriting of your code.

The second book is more higher-level.  It does include (Java) examples, but talks more about parallel programming concepts.

Filed Under (DSCR, Optimization, Training) by John Pormann, Ph.D. on 19-11-2010

We are continuing to develop new web-based video training modules — the next one up is an introduction to the Intel compiler suite:

With the new Centos-5 operating system, we have installed the latest releases of the Intel optimizing compilers — C, C++, Fortran 77, and Fortran 90/95.  In many cases, a simple re-compilation of your program can result in 20-50% improvement in performance as the Intel compiler is better able to shuffle streams of mathematical operations to obtain the best use of all the CPU’s resources, including SSE/AVX functions.  The video covers basic optimization (e.g. -O3), inter-procedure optimization,  profile-guided optimization, and compiler reporting options.

We will continue to expand our web-based training so if you have specific topics you’d like us to cover, please email us at scsc at duke edu.