Programs in Physics & Physical Chemistry
|[Licence| Download | New Version Template] aelw_v1_0.tar.gz(34465 Kbytes)|
|Manuscript Title: Sassena - X-ray and Neutron Scattering Calculated from Molecular Dynamics Trajectories using Massively Parallel Computers|
|Authors: Benjamin Lindner, Jeremy C. Smith|
|Program title: Sassena|
|Catalogue identifier: AELW_v1_0|
Distribution format: tar.gz
|Journal reference: Comput. Phys. Commun. 183(2012)1491|
|Programming language: C++, OpenMPI.|
|Computer: Distributed Memory, Cluster of Computers with high performance network, Supercomputer.|
|Operating system: UNIX, LINUX, OSX.|
|Has the code been vectorised or parallelized?: Yes, the code has been parallelized using MPI directives. Tested with up to 7000 processors.|
|RAM: Up to 1Gbytes/core|
|Keywords: Neutron, X-ray, Scattering, Molecular dynamics, Simulation, Massively Parallel.|
|PACS: 32.30Rj, 33.20Rm.|
|Classification: 6.5, 8.|
External routines: Boost Library, FFTW3, CMAKE, GNU C++ Compiler, OpenMPI, LibXML, LAPACK
Nature of problem:
Recent developments in supercomputing allow molecular dynamics simulations to generate large trajectories spanning millions of frames and thousands of atoms. The structural and dynamical analysis of these trajectories requires analysis algorithms which use parallel computation and IO schemes to solve the computational task in a practical amount of time. The particular computational and IO requirements very much depend on the particular analysis algorithm. In scattering calculations a very frequent pattern is that the trajectory data is used multiple times to compute different projections and aggregates this into a single scattering function. Thus, for good performance the trajectory data has to be kept in memory and the parallel computer has to have enough RAM to store a volatile version of the whole trajectory. In order to achieve high performance and good scalability the mapping of the physical equations to a parallel computer needs to consider data locality and reduce the amount of the inter-node communication.
The physical equations for scattering calculations were analyzed and two major calculation schemes were developed to support any type of scattering calculation (all/self). Certain hardware aspects were taken into account, e.g. high performance computing clusters and supercomputers usually feature a 2 tier network system, with ethernet providing the file storage and infiniband the inter-node communication via MPI calls. The time spent loading the trajectory data into memory is minimized by letting each core only read the trajectory data it requires. The performance of inter-node communication is maximized by exclusively utilizing the appropriate MPI calls to exchange the necessary data, resulting in an excellent scalability.
The partitioning scheme developed to map the calculation onto a parallel computer covers a wide variety of use cases without negatively effecting the achieved performance. This is done through a 2D partitioning scheme where independent scattering vectors are assigned to independent parallel partitions and all communication is local to the partition.
Usual runtime span from 1min on 20 nodes to 2h on 2000 nodes. That is 0.5 - 4000 CPU hours per execution.
|Disclaimer | ScienceDirect | CPC Journal | CPC | QUB|