Programs in Physics & Physical Chemistry
|[Licence| Download | New Version Template] aeit_v3_0.tar.gz(1722 Kbytes)|
|Manuscript Title: RNGAVXLIB: Program library for random number generation, AVX realization.|
|Authors: M.S. Guskova, L.Yu. Barash, L.N. Shchur|
|Program title: RNGAVXLIB|
|Catalogue identifier: AEIT_v3_0|
Distribution format: tar.gz
|Journal reference: Comput. Phys. Commun. 200(2016)402|
|Programming language: C, Fortran.|
|Computer: PC, laptop, workstation, or server with Intel or AMD processor.|
|Operating system: Unix, Windows.|
|RAM: 4 Mbytes|
|Keywords: Statistical methods, Monte Carlo, Random numbers, Pseudorandom numbers, Random number generation, Advanced Vector Extensions (AVX).|
Does the new version supersede the previous version?: Yes
Nature of problem:
Any calculation requiring uniform pseudorandom number generator, in particular, Monte Carlo calculations. Any calculation requiring parallel streams of uniform pseudorandom numbers.
The library contains realization of the following modern and reliable generators: MT19937 , MRG32K3A , LFSR113 , GM19, GM31, GM61 [5, 6], and GM29, GM55, GQ58.1, GQ58.3, GQ58.4 [7, 8]. The library contains realizations written in ANSI C, realizations based on SSE command set and realizations based on AVX command set. The use of vectorization allows substantial improvement in performance of all the generators. The library also contains the ability to jump ahead inside the RNG sequence and to initialize independent random number streams with block splitting method for each of the RNGs. C and Fortran are supported.
Reasons for new version:
Modern CPUs better support vectorization compared to the CPUs available two years ago when the previous version of the library was prepared. In particular, Advanced Vector Instructions 2 (AVX2) are now supported by CPUs fabricated by Intel and AMD. AVX2 has been supported by Intel CPUs since the Haswell microarchitecture was released in June 2013, and has been supported by AMD CPUs since the Streamroller Family 15h microarchitecture was released in January 2014. An important new feature of this version is the ability to employ the AVX2 instruction set of a CPU in order to speed up the calculations. As a result, the new RNG realizations employing AVX2 are up to 2 times faster than the realizations implemented in the previous version of the library.
Summary of revisions:
For AVX realizations of the generators, Intel or AMD CPU supporting AVX2 command set is required. For SSE realizations of the generators, Intel or AMD CPU supporting SSE2 command set is required. In order to use the SSE realization for the lfsr113 generator, CPU must support SSE4.1 command set.
The function call interface has been simplified compared to the previous versions. For each of the generators, RNGAVXLIB supports the following functions, where rng should be replaced by the particular name of the RNG: void rng_init_(rng_state* state); void rng_init_sequence_(rng_state* state,unsigned long long SequenceNumber); void rng_skipahead_(rng_state* state, unsigned long long N); unsigned int rng_generate_(rng_state* state); float rng_generate_uniform_float_(rng_state* state); unsigned int rng_ansi_generate_(rng_state* state); float rng_ansi_generate_uniform_float_(rng_state* state); unsigned int rng_sse_generate_(rng_state* state); float rng_sse_generate_uniform_float_(rng_state* state); unsigned int rng_avx_generate_(rng_state* state); float rng_avx_generate_uniform_float_(rng_state* state); void rng_print_state_(rng_state* state); The function call interface for the rng_skipahead_ function, which jumps ahead N output values inside an RNG sequence, can be slightly different for some of the RNGs. For example, the function void mt19937_skipahead_(mt19937_state* state, unsigned long long a, unsigned b); skips ahead N = a · 2b numbers, where N < 2512, and the function void gm55_skipahead_(gm55_state* state, unsigned long long offset64, unsigned long long offset0); skips ahead N = 264 · offset64 + offset0 numbers. The detailed function call interface can be found in the header files of the include directory. The examples of using the library can be found in the examples directory. Some of the generators have several versions of the rng_init_sequence_, routine, for example, rng_init_short_sequence_, rng_init_medium_sequence_, rng_init_long_sequence_ (see details in [1, 10]). Maximal number of sequences and maximal length of each sequence for pseudorandom streams are indicated in [1, 10]. The algorithms used to jump ahead in the RNG sequence and to initialize parallel streams of pseudorandom numbers are described in detail in [9, 10]. This version of the library automatically detects whether the CPU supports SSE and/or AVX vectorization at the compilation stage. During the compilation of the library, the -march=native compiler option is used, which allows the use of predefined macros such as __SSE2__ and __AVX2__ in the source code. This is supported by both GNU and Intel compilers. The functions rng_generate_ and rng_generate_uniform_float employ SSE and AVX vectorization if the CPU supports them. Table 1: Speed of the realizations. CPU: Intel Xeon E5-2650v3 (2.3 GHz); Compiler: gcc; Optimization: -O3.
Running time is of the order of 20 sec for generating 109 pseudorandom numbers with a PC based on Intel Core i7-940 CPU. Speed of the random number generation on CPUs widely used in modern servers and workstations is shown in Tables 1 and 2 respectively (see also [6, 7]).
|||L.Yu Barash, L.N. Shchur, RNGSSELIB: Program library for random number generation. More generators, parallel streams of random numbers and Fortran compatibility, Computer Physics Communications, 184(10), 2367-2369 (2013).|
|||M. Matsumoto and T. Tishimura, Mersenne Twister: A 623- dimensionally equidistributed uniform pseudorandom number generator, ACM Trans. on Mod. and Comp. Simul. 8 (1), 3-30 (1998).|
|||P. L'Ecuyer, Good Parameter Sets for Combined Multiple Recursive Random Number Generators, Oper. Res. 47 (1), 159-164 (1999).|
|||P. L'Ecuyer, Tables of Maximally-Equidistributed Combined LFSR Generators, Math. of Comp., 68 (255), 261-269 (1999).|
|||L. Barash, L.N. Shchur, Periodic orbits of the ensemble of Sinai-Arnold cat maps and pseudorandom number generation, Phys. Rev. E 73, 036701 (2006).|
|||L.Yu Barash, L.N. Shchur, RNGSSELIB: Program library for random number generation, SSE2 realization, Computer Physics Communications, 182 (7), 1518-1527 (2011).|
|||L.Yu. Barash, Applying dissipative dynamical systems to pseudorandom number generation: Equidistribution property and statistical independence of bits at distances up to logarithm of mesh size, Europhysics Letters (EPL) 95, 10003 (2011).|
|||L.Yu. Barash, Geometric and statistical properties of pseudorandom number generators based on multiple recursive transformations // Springer Proceedings in Mathematics and Statistics, Springer-Verlag, Berlin, Heidelberg, Vol. 23, 265280 (2012).|
|||L.Yu. Barash, L.N. Shchur, On the generation of parallel streams of pseudorandom numbers, Programmnaya inzheneriya, 1 (2013) 24 (in Russian)|
|||L.Yu. Barash, L.N. Shchur, PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs, Computer Physics Communications, 185(4), 1343-1353 (2014).|
|||Voevodin Vl.V., Zhumatiy S.A., Sobolev S.I., Antonov A.S., Bryzgalov P.A., Nikitenko D.A., Stefanov K.S., Voevodin Vad.V., Practice of "Lomonosov" Supercomputer // Open Systems J. - Moscow: Open Systems Publ., 2012, no.7. (In Russian)|
|Disclaimer | ScienceDirect | CPC Journal | CPC | QUB|