|[Licence| Download | New Version Template] aerm_v2_0.tar.gz(25 Kbytes)|
|Manuscript Title: Improved CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models|
|Authors: Yukihiro Komura, Yutaka Okabe|
|Program title: SWspin_v2_0|
|Catalogue identifier: AERM_v2_0|
Distribution format: tar.gz
|Journal reference: Comput. Phys. Commun. 200(2016)400|
|Programming language: C, CUDA.|
|Computer: System with an NVIDIA CUDA enabled GPU.|
|Operating system: No limits (tested on Linux).|
|RAM: About 2MiB for the parameters used in the sample programs.|
|Keywords: Monte Carlo simulation, cluster algorithm, Ising model, XY model, parallel computing, GPU.|
Does the new version supersede the previous version?: No
Nature of problem:
Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices.
GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678], that by Kalentev et al. [O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620], and that by Komura [Y. Komura, Comput. Phys. Comm. 194 (2015) 54-58].
Reasons for new version:
- Adding the method of GPU-based cluster-labeling algorithm without the
use of conventional iteration .
- Adding a random-number generator in the cuRAND library  for high-
- Fixing several bugs and removing the extra usage of shared memory in the
Summary of revisions:
- Recently, we proposed the GPU-based cluster-labeling algorithm without the use of conventional iteration . This cluster-labeling algorithm does not require an iterative method of comparison with the nearest-neighbor sites. The number of comparisons with the nearest-neighbor site in this method is one for a two dimensional system and two for a three-dimensional system if periodic boundary conditions are not employed. To realize this cluster-labeling algorithm, the atomic function, which is performed without interference from any other threads, is needed.
Now, we explain about the added part of programs. In this update, we add the GPU-based cluster-labeling algorithm without the use of conventional iteration as a direct-type algorithm  to the present programs. This cluster-labeling algorithm consists of four steps: (i) initialization (ii) analysis (iii) label reduction (iv) analysis, and we add those steps to the present programs. for the cluster-labeling algorithm, we can choose the algorithm of Hawick et al.  (algorithm = 0), the algorithm by Kalentev et al.  (algorithm = 1), or the algorithm by Komura  (algorithm = 2). The kernel function
device_function_init _Y K;
is a function for the step of active bond generation, which corresponds to the step of initialization for the algorithm of Komura. Three kernel functions
are used in the step of cluster labeling for the algorithm of Komura. Those functions correspond to the steps of analysis, label reduction, and analysis, respectively. The cluster-labeling algorithm of Komura does not require an iterative method such as the cluster-labeling algorithms of Hawick et al. and Kalentev et al.. The kernel functions
are used in the step of spin flip.
- In the previous programs, we used a linear congruential random-number generator which was proposed by Preis et al. . The computational cost and the usage of memory of the linear congruential random-number generator are low. However, the higher-quality random-number generator is needed for high-precision Monte Carlo simulations. In the CUDA, the cuRAND library , which focuses on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers, is provided. In this update, we modify a linear congruential random-number generator to a random-number generator in the cuRAND library. We use a random-number generator of
the host API in the cuRAND library.
The kernel function
is used to create a random-number generator. In the sample programs, we use an XORWOW pseudorandom generator . However we can choose other random-number generators by changing the parameter of this function. The kernel function
is used to generate random numbers. The generated random numbers are stored in the d_random_data. for high-precision Monte Carlo simulations, we use the random numbers of double precision.
The system size is limited depending on the memory of a GPU. Since the usage of memory in the present programs is increased compared with that of the previous programs, the maximum system size of the present programs is smaller than that of the previous programs.
For the parameters used in the sample programs, it takes about a minute for each program. The computational time depends on the system size, the number of Monte Carlo steps, etc.
| ||Y. Komura, GPU-based cluster-labeling algorithm without the use of conventional iteration: Application to the Swendsen-Wang multi-cluster spin flip algorithm, Comput. Phys. Comm. 194 (2015) 54-58.|
| ||cuRAND CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/curand/|
| ||K.A. Hawick, A. Leist, D. P. Playne, Parallel Graph Component Labelling with GPUs and CUDA, Parallel Computing 36 (2010) 655-678.|
| ||O. Kalentev, A. Rai, S. Kemnitzb, R. Schneider, Connected component labeling on a 2D grid using CUDA, J. Parallel Distrib. Comput. 71 (2011) 615-620.|
| ||T. Preis, P. Virnau, W. Paul, J.J. Schneider, GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model, J. Comp. Phys. 228 (2009) 4468-4477.|
| ||G. Marsaglia, Xorshift RNGs, Journal of Statistical Software 8 (2003).|