In this assignment, you are going to parallelize two different
codes for diffusion on a ring (i.e, a 1d diffusion equation with
periodic boundary conditions).
- One code, 'diffring' will be the numerical diffusion of
assignments 7 that solved the PDE using a
BLAS call implemented by the openblas library.
- The other, 'diffusion1d' uses an
alternative method that does not use blas but computes the
matrix-vector operations using for loops. You can get this code on the
teach cluster with
$ git clone /scinet/course/phy1610/diffusion1d
The aim is to speed up both codes using shared-memory parallelism.
For the first code, this entail figuring out how to unlock openblas's
multithreaded capability. The second code will require adding
OpenMP directives.
Note that the parameter values used in assignment 7 would lead to rather short
run-times, and therefore statistically hard to determine speedups, so
the spatial and temporal resolution parameters in the params.ini file
in the diffusion1d repo have lower values for dx and dt compared
to the previous assignment: dx=0.005 and dt=0.000005. Make sure to use the new params.ini file
also for diffring.Update (March 24): These parameters are too small for diffring, better parameters are dx=0.015 and dt=0.00007.
- We'll assume you've already compiled the first code.
- Speed up this first code by enabling multithreading in OpenBLAS. This
may mean you need to reinstall openblas. (hint: read the openblas documentation; it is possible your openblas was already doing multithreading, in which case, figure out how to control the number of threads that openblas uses).
- Compile the second code on the teach cluster with the modules 'gcc',
'boost', and 'rarray' loaded.
- Parallelize the second application using OpenMP directives. Remember to always use default(none) in
pragma omp parallel directives.
- Write two or more job scripts to perform the scaling analyses for
diffring and for diffusion1d on
compute nodes of the teach cluster, i.e., write scripts that would run the applications
for values of the number of threads ranging from 1 to 16 and record
the timings. These
job scripts should be submitted to the scheduler.
- Make a plot of the speedups, and, assuming Amdahl's law holds, try to estimate the serial fraction f for both applications.
- Which of the two codes performs best?
As in previous assignments, you should still use git version control
for the codes and makefiles, and for the job scripts and the scaling
analysis. Keep the git repos for the diffring code, for the
diffusion1d code, and for the scaling analysis separate (the latter
should include a report with the speed-up plots, tables containing
the timing data, and the determination of the serial fraction, of both
codes).
Please submit three files, corresponding to the git2zip-ed repos.
Submission deadline is March 29, 2020 at 11:55 PM. The usual penalty for late submissions applies.
Last Modified: Friday Mar 27, 2020 - 19:27. Revision: 5. Release Date: Thursday Mar 19, 2020 - 23:00.