Difference between revisions of "MPI Cluster"

From Dynamo
Jump to navigation Jump to search
(Created page with "''Dynamo'' can be run on a cluster of CPUs. == Compilation == Compiling ''Dynamo'' on your cluster requires a {{t|cc}} compiler that links the MPI libraries. In most syste...")
 
Line 1: Line 1:
''Dynamo'' can be run on a cluster of CPUs.
+
''Dynamo'' can be run as {{standalone}} on a cluster of CPUs. This works for alignment and classification projects.
 +
 
 +
Using the ''Dynamo'' standalone in  a CPU cluster requires some additional steps compared to the execution on a single server during an interactive session.
 +
 
 +
# Compile specifically for your cluster
 +
# Create a ''cluster header'' file that will tell ''Dynamo'' about the syntax expected by your queuing system.
 +
# Each time you create a project, tell it to use the cluster header to produce a project execution script (extension <tt>.sh</tt>)
 +
# Submit the execution script representing the project to your cluster.
  
 
== Compilation ==
 
== Compilation ==
Line 9: Line 16:
 
<tt>module avail</tt>
 
<tt>module avail</tt>
  
on the shell of your login node to check the available modules. Modules for parallel computation typically will include an mpi-enabled compiler. You need to load one of them, for instance:.
+
on the shell of your login node to check the available modules. Modules for parallel computation typically will include an mpi-enabled compiler. You need to load one of them, for instance:
  
 
<tt> module load mpiCC</tt>
 
<tt> module load mpiCC</tt>
Line 21: Line 28:
 
If you are fortunate enough, your cluster environment should have some information system (like a webpage) that tells you the modules that you are expected to use for compilation, and the attached compilers.  
 
If you are fortunate enough, your cluster environment should have some information system (like a webpage) that tells you the modules that you are expected to use for compilation, and the attached compilers.  
 
    
 
    
 
 
Once you know the name of the compiler that you are going to use (say, {{t|mpicc}}),  you can proceed compile the MPI executables:
 
Once you know the name of the compiler that you are going to use (say, {{t|mpicc}}),  you can proceed compile the MPI executables:
  
Line 30: Line 36:
  
  
 +
==Cluster Header file==
  
==Cluster Header file ==
+
A ''cluster header'' file allows ''Dynamo'' to produce an [[execution script]] for a project that will be understood by the specific syntax of your cluster.
 +
You have several examples of ''cluster header'' files in the <tt><DYNAMO_ROOT>/mpi</tt> folder of your ''Dynamo'' installation.
  
  
 
== Preparing a project ==
 
== Preparing a project ==
  
You need first to pass
+
You need first to tune several parameters in your project, through the GUI or the command line
 +
 
 +
 
 
=== GUI===
 
=== GUI===
  
* You need to make certain that you are dialing the {{t|cluster MPI}}  option on the {{t|Computing Environment}} GUI.  
+
After opening a project in the {{t|dcp GUI}} you need to set fo {{t|Computing Environment}} GUI
* Select the number of cores in ''CPU Cores''. Each one will be handled by a separate MPI task.
+
 
 +
* Make certain that you are dialing the {{t|cluster MPI}}  option on the.  
 +
* Select the number of cores on the field {{t|CPU cores}}. Each one will be handled by a separate MPI task.
 
* Make certain that the {{t|Parallelized averaging step}} in the bottom panel is set to zero. This option only applies to Matlab based computations.
 
* Make certain that the {{t|Parallelized averaging step}} in the bottom panel is set to zero. This option only applies to Matlab based computations.
 
* Pass the path to the cluster header file.
 
* Pass the path to the cluster header file.
  
 
===Command line===
 
===Command line===
 +
 +
All the steps above can be performed through the command line, using the names of the project parameters. You can follow the examples below, where a project called {{t|myProject}} gets its parameters tuned with the command {{t|dvput}}
 +
 +
* <tt>dvput myProject destination mpi </tt>
 +
* <tt>dvput myProject cores 128 </tt>
 +
* <tt>dvput myProject mwa 0 </tt>
 +
* <tt>dvput myProject cluster myClusterHeader.sh </tt>
 +
 +
 +
Remember that the ''Dynamo'' command {{t|dvhelp}} will list the different project parameters that can be edited by the user through {{t|dvput}}
  
 
==Performance==
 
==Performance==
 +
 +
In some clusters, following the above procedure without further tuning can lead ''Dynamo'' to show a very slow performance. This is normally related to the fact ''Dynamo'' as standalone works on the [[MCR libraries]]. These libraries might need some tuning for your system.
 +
 +
 +
 +
  
 
==Using a cluster under Matlab==
 
==Using a cluster under Matlab==
If your cluster supports
+
If your cluster supports running Matlab jobs through the Distributed Computing Engine... that's perfect. You don't need to use the MPI version of ''Dynamo'': no need to compile the MPI executables, no need to design a cluster header file. You just use the {{t|destination}} parameter {{t|matlab_parfor}} and Matlab will take care of everything.

Revision as of 10:22, 18 May 2016

Dynamo can be run as Template:Standalone on a cluster of CPUs. This works for alignment and classification projects.

Using the Dynamo standalone in a CPU cluster requires some additional steps compared to the execution on a single server during an interactive session.

  1. Compile specifically for your cluster
  2. Create a cluster header file that will tell Dynamo about the syntax expected by your queuing system.
  3. Each time you create a project, tell it to use the cluster header to produce a project execution script (extension .sh)
  4. Submit the execution script representing the project to your cluster.

Compilation

Compiling Dynamo on your cluster requires a cc compiler that links the MPI libraries.

In most systems, you can run the command:

module avail

on the shell of your login node to check the available modules. Modules for parallel computation typically will include an mpi-enabled compiler. You need to load one of them, for instance:

module load mpiCC

This should add some compilers to your path. They are typically called mpiCC, mpicc... It is a good idea to check the availability and syntax of the compiler provided by the module just loaded.

which mpicc

should give you a complete path to a compiler called mpicc on your path. If this is not the case, try with alternative syntax.

If you are fortunate enough, your cluster environment should have some information system (like a webpage) that tells you the modules that you are expected to use for compilation, and the attached compilers.

Once you know the name of the compiler that you are going to use (say, mpicc), you can proceed compile the MPI executables:

cd <DYNAMO_ROOT>/mpi source dynamo_compile_mpi.sh mpicc

If you get an error during compilation, try with a different module of a different compiler inside the same module.


Cluster Header file

A cluster header file allows Dynamo to produce an execution script for a project that will be understood by the specific syntax of your cluster. You have several examples of cluster header files in the <DYNAMO_ROOT>/mpi folder of your Dynamo installation.


Preparing a project

You need first to tune several parameters in your project, through the GUI or the command line


GUI

After opening a project in the dcp GUI you need to set fo Computing Environment GUI
  • Make certain that you are dialing the cluster MPI option on the.
  • Select the number of cores on the field CPU cores. Each one will be handled by a separate MPI task.
  • Make certain that the Parallelized averaging step in the bottom panel is set to zero. This option only applies to Matlab based computations.
  • Pass the path to the cluster header file.

Command line

All the steps above can be performed through the command line, using the names of the project parameters. You can follow the examples below, where a project called myProject gets its parameters tuned with the command dvput

  • dvput myProject destination mpi
  • dvput myProject cores 128
  • dvput myProject mwa 0
  • dvput myProject cluster myClusterHeader.sh


Remember that the Dynamo command dvhelp will list the different project parameters that can be edited by the user through dvput

Performance

In some clusters, following the above procedure without further tuning can lead Dynamo to show a very slow performance. This is normally related to the fact Dynamo as standalone works on the MCR libraries. These libraries might need some tuning for your system.



Using a cluster under Matlab

If your cluster supports running Matlab jobs through the Distributed Computing Engine... that's perfect. You don't need to use the MPI version of Dynamo: no need to compile the MPI executables, no need to design a cluster header file. You just use the destination parameter matlab_parfor and Matlab will take care of everything.