MPI Cluster

From Dynamo
Jump to navigation Jump to search

Dynamo can be run as standalone on a cluster of CPUs. This works for alignment and classification projects. Using the Dynamo standalone in a CPU cluster requires some additional steps compared to the execution on a single server during an interactive session.

  1. Compile specifically for your cluster
  2. Create a cluster header file that will tell Dynamo about the syntax expected by your queuing system.
  3. Each time you create a project, tell it to use the cluster header to produce a project execution script (extension .sh)
  4. Submit the execution script representing the project to your cluster.

Compilation

The executables delivered in the Dynamo distribution should work in a Linux workstation for parallel jobs run on the same machine (through OMP threads). They are NOT enough to run parallel jobs in a cluster of different machines, which requires the MPI libraries. Thus, compiling Dynamo on your cluster requires an additional step involving a cc compiler available on your side that links the MPI libraries. In most systems, you can run the command:

module avail

on the shell of the login node of your cluster to check the available modules that you can switch on. Modules for parallel computation typically will include an mpi-enabled compiler. You need to load one of them, for instance:

module load mpiCC

This should add some compilers to your path. They are typically called mpiCC, mpicc... It is a good idea to check the availability and syntax of the compiler provided by the module just loaded.

which mpicc

should give you the complete path to a compiler called mpicc on your $PATH. If this is not the case, try with alternative syntax (miCC), etc.

If you are lucky enough, your cluster environment should have some information system (like a webpage) that tells you the modules that you are expected to use for MPI-compatible compilation, and the attached compilers.

Once you know the name of the compiler that you are going to use (say, mpicc), you can proceed compile the MPI executables:

cd <DYNAMO_ROOT>/mpi source dynamo_compile_mpi.sh mpicc

Note that you pass the name of the compiling command as second argument of source. If you get an error during compilation, try with a different module of a different compiler inside the same module.

Cluster header file

A cluster header file allows Dynamo to produce an execution script for a project that will be understood by the specific syntax of your cluster. You have several examples of cluster header files in the <DYNAMO_ROOT>/mpi folder of your Dynamo installation.

Preparing a project

You need first to tune several parameters in your project, through the GUI or the command line.

GUI

After opening a project in the dcp GUI, switch to the Computing Environment GUI and:

  • Make certain that you are dialing the cluster MPI option on the.
  • Select the number of cores on the field CPU cores. Each one will be handled by a separate MPI task.
  • Make certain that the Parallelized averaging step in the bottom panel is set to zero. This option only applies to Matlab based computations.
  • Pass the path to the cluster header file.

Other secondary options are the walltime) (maximum time allowed to the job, something occasionally required in some clusters ) or the submission order. If you pass an explicit syntax for the submission order, then you can submit jobs with the run option of the GUI instead of using the command line.

Command line

All the steps above can be performed through the command line, using the names of the project parameters. You can follow the examples below, where a project called myProject gets its parameters tuned with the command dvput

  • dvput myProject destination mpi
  • dvput myProject cores 128
  • dvput myProject mwa 0
  • dvput myProject cluster myClusterHeader.sh

Remember that the Dynamo command dvhelp will list the different project parameters that can be edited by the user through dvput

Executing the project

Once the project has the right parameters, you can unfold it normally to produce an execution script. Then, you submit it from the command line. The concrete syntax may change depending on the queuing system controlling your cluster, typical examples are:

qsub myProject.sh (in PBS queues)

or

sbatch myProject.sh (in SLURM queues).

It is a sane policy to check the contents of the execution script myProject.sh before submitting it, in order to check that everything went smooth and Dynamo was able to use the #Cluster header file to convert your project in a text file with the right syntax for your queue.

Performance

In some clusters, following the above procedure without further tuning can lead Dynamo to show a very slow performance. This is normally related to the fact Dynamo as standalone works on the MCR libraries. These libraries might need some tuning for your system.

This can be done by making certain that the MCR_CACHE_ROOT variable is set to a fast file share on your system. Most parallel clusters offer the users an area of the disk called scratch, with a specially good I/O performance. If that's the case, it is a good idea to make certain that your project will have its MCR_CACHE_ROOT tuned to (a subfolder of) that location. You can do this by just editing the execution script with something like:

mkdir $SCRATCH/temporal export MCR_CACHE_ROOT=$SCRATCH/temporal

You can also insert this in your cluster header file, or use the mcr parameter of the project to instruct a particular project to use a particular

Several MCR extraction folders

Dynamo includes the experimental option of selecting a different MCR_CACHE_ROOT for each one of the spawned MPI tasks. You should try this option if you notice a too slow performance.

Parallelization system

If you have M particles and indicate N cores, Dynamo will assign a priori M/N particles to each core. Each core is governed by a different MPI tasks.

The current system will be changed in future releases to allow for dynamic assignation of particles during the process.

Additionally: currently the alignment step is distributed along all the available tasks, but the averaging itself is processed by a single core. In most cases averaging is not a bottleneck, and this is not a problem: averaging is typically a small fraction of time compared to alignment. This can however be a problem when you have several thousands particles and the alignment step only covers a few Euler triplets.

Algorithms

The algorithms in the MPI version are the same used in the Matlab or Standalone versions.

Using a cluster under Matlab

This is a totally different scenario. If your cluster supports running Matlab jobs through the Distributed Computing Engine... that's perfect. You don't need to use the MPI version of Dynamo: no need to compile the MPI executables, no need to design a cluster header file. You just use the destination parameter matlab_parfor and Matlab will take care of everything.

Additionally, in this setting the averaging step is parallelized exactly the same as the aligning step.

Note that the engine should be deployed and maintained on your cluster. Having a single licence on the login node (even if the Parallel Computing Toolbox is active there) will NOT be sufficient to run Dynamo-Matlab jobs on your cluster. Unfortunately, not many institutions offer Distributed Computing Engine licenses.