Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

[Licence| Download | New Version Template] aewl_v2_0.tar.gz(125 Kbytes)
Manuscript Title: CUDA programs for solving the time-dependent dipolar Gross-Pitaevskii equation in an anisotropic trap
Authors: Vladimir Loncar, Antun Balaz, Aleksandar Bogojević, Srdjan Škrbić, Paulsamy Muruganandam, Sadhan K. Adhikari
Program title: DBEC-GP-CUDA package, consisting of: (i) imag2dXY-cuda, (ii) imag2dXZ-cuda, (iii) imag3d-cuda, (iv) real2dXY-cuda, (v) real2dXZ-cuda, (vi) real3d-cuda.
Catalogue identifier: AEWL_v2_0
Distribution format: tar.gz
Journal reference: Comput. Phys. Commun. 200(2016)406
Programming language: CUDA C.
Computer: Any modern computer with Nvidia GPU with Compute Capability 2.0 or higher, with CUDA toolkit (compiler and runtime, with cuFFT library, minimum version 6.0) installed.
Operating system: Linux.
Has the code been vectorised or parallelized?: Yes, using CUDA. One CPU core and one Nvidia GPU are used.
RAM: With provided example inputs, programs should run on a computer with 512 MB GPU RAM. There is no upper limit to amount of memory that can be used, as larger grid sizes require more memory, which scales as NX*NY or NX*NZ (in 2d) or NX*NY*NZ (in 3d). All programs require roughly the same amount of CPU and GPU RAM.
Supplementary material: A pdf of the full manuscript for this version can be downloaded. It includes an individual summary for each of the above programs and the "Summary of revisions" information.
Keywords: Bose-Einstein condensate, Dipolar atoms, Gross-Pitaevskii equation, Split-step Crank-Nicolson scheme, Real- and imaginary-time propagation, C program, GPU, CUDA program, Partial differential equation.
PACS: 02.60.Lj, 02.60.Jh, 02.60.Cb, 03.75.-b.
Classification: 2.9, 4.3, 4.12.

External routines: CUDA toolkit, version 6.0 or higher, with cuFFT library.

Does the new version supersede the previous version?: No

Nature of problem:
These programs are designed to solve the time-dependent nonlinear partial differential Gross-Pitaevskii (GP) equation with contact and dipolar interactions in two or three spatial dimensions in a harmonic anisotropic trap. The GP equation describes the properties of a dilute trapped Bose-Einstein condensate.

Solution method:
The time-dependent GP equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation, in either imaginary or real time, over small time steps. The contribution of the dipolar interaction is evaluated by a Fourier transformation to momentum space using a convolution theorem. The method yields the solution of stationary and/or non-stationary problems.

Reasons for new version:
Previously published dipolar Fortran and C programs [1], based on earlier programs and algorithms for GP equation with the contact interaction [2], are already used within the ultra-cold atoms community [3]. However, they are sequential, and thus did not allow for use of the maximum computing performance modern computers can offer. For this reason we have explored possible ways to accelerate our programs. Detailed profiling revealed that the calculation of FFTs is the most computationally demanding part of our programs. Since using GPUs to compute FFTs with optimized libraries like the cuFFT can lead to much better performance, we have decided to parallelize our programs using Nvidia CUDA toolkit. Also, the massive parallelism offered by GPUs could be exploited to parallelize the nested loops our programs have. We have focused on 2d and 3d versions of our programs, as they perform enough computation to justify and require the use of massive parallelism.

Summary of revisions:
See "Supplementary material:", above.

Programs will only run on computers with Nvidia GPU card (Tesla or GeForce) with Compute Capability 2.0 or higher (Fermi architecture and newer) and with CUDA toolkit installed (version 6.0 or higher).

Unusual features:
As part of the memory usage optimizations, programs may slightly increase the number of spatial grid points in each dimension (NX, NY, NZ). This is due to FFT algorithms of cuFFT library that require additional memory to store temporary results. Our programs reuse already allocated memory to provide cuFFT with the temporary memory it requires, however, some problem sizes require much more memory, up to eight times more [5]. For instance, if the number of grid points in any dimension is a large prime number, cuFFT uses an algorithm that requires eight times more memory than similarly sized power of two number. Adjustments of the number of grid points made in the programs ensure that cuFFT will not require such significantly increased additional memory. If the programs perform the adjustments to grid size, this is reported in the output.

Additional comments:
This package consists of six programs, see Program title above. For the particular purpose of each program, please see the pdf available at the "Supplementary material:" link location.

Running time:
Example inputs provided with the programs take less than one minute on Nvidia Tesla M2090 GPU.

[1] R. Kishor Kumar, L. E. Young-S., D. Vudragović, A. Balaz, P. Muruganandam, and S. K. Adhikari, Fortran and C programs for the time-dependent dipolar Gross-Pitaevskii equation in an anisotropic trap, Comput. Phys. Commun. 195 (2015) 117.
[2] P. Muruganandam and S. K. Adhikari, Comput. Phys. Commun. 180 (2009) 1888; D. Vudragović, I. Vidanović, A. Balaz, P. Muruganandam, and S. K. Adhikari, Comput. Phys. Commun. 183 (2012) 2021; P. Muruganandam and S. K. Adhikari, J. Phys. B: At. Mol. Opt. Phys. 36 (2003) 2501.
[3] R. Kishor Kumar, P. Muruganandam, and B. A. Malomed, J. Phys. B: At. Mol. Opt. Phys. 46 (2013) 175302; S. K. Adhikari, Bright dipolar Bose-Einstein-condensate soliton mobile in a direction perpendicular to polarization, Phys. Rev. A 90 (2014) 055601; S. K. Adhikari, Stable matter-wave solitons in the vortex core of a uniform condensate, J. Phys. B: At. Mol. Opt. Phys. 48 (2015) 165303; S. K. Adhikari, Stable spatial and spatiotemporal optical soliton in the core of an optical vortex, Phys. Rev. E 92 (2015) 042926; T. Khellil, A. Balaz, and A. Pelster, Dirty bosons in a quasi-one-dimensional harmonic trap, e-print arXiv:1510.04985 (2015).
[4] M. Harris, CUDA Pro Tip: Write Flexible Kernels with Grid-Stride Loops, Parallel Forall Blog, http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/ (2013).
[5] cuFFT, CUDA API References, CUDA Toolkit Documentation v7.5, http://docs.nvidia.com/cuda/cufft/ (2015).