Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

[Licence| Download | New Version Template] aehi_v2_0.tar.gz(45 Kbytes)
Manuscript Title: Revision of wFMM - A Wideband Fast Multipole Method for the Two Dimensional Complex Helmholtz Equation
Authors: Min Hyung Cho, Wei Cai
Program title: 2D-WFMM
Catalogue identifier: AEHI_v2_0
Distribution format: tar.gz
Journal reference: Comput. Phys. Commun. 183(2012)446
Programming language: C.
Computer: Any.
Operating system: Any operating system with gcc compiler. For the multi-thread computing, the gcc version 4.4 or newer is recommended.
RAM: Depending on the number of particles N and wavenumber k
Keywords: Wideband Fast Multipole Method, Helmholtz equation, Fast solver.
Classification: 4.8, 4.12.

External routines: OpenMP (http://openmp.org/wp/)

Does the new version supersede the previous version?: Yes

Nature of problem:
Evaluate the interaction between N particles governed by the fundamental solution of 2D complex Helmholtz equation with wide range of wavenumber k.

Solution method:
Multilevel Fast Multipole Algorithm in a hierarchical quad-tree data structure with a cutoff level, which combines low frequency method and high frequency method.

Reasons for new version:
To improve the efficiency of the program including memory usage, repeated use in an iterative solver or other programs, and a minor speed up.

Summary of revisions:
First, the tree searching method in downward pass is modified to accommodate the higher level of tree structure and save memory. The original version used a queue data structure to visit the tree from the top to bottom level. However, in this new version, the queue method is replaced with a simple recursion algorithm. As a result, the code uses less memory, and the high level tree refinement becomes more efficient. Secondly, memory leaks and external variables that caused problems in the repeated usage in an iterative solver, are fixed. The original version had no problem as stand alone software. However, when it was used repeatedly in other codes, the code did not release the memory after it was called. This causes significant problems for large matrix systems. In the new version, all the memory allocations are tracked and freed as soon as they become unnecessary. Consequently, the new version uses almost constant memory for all wavenumbers N is fixed. The new version does not stack up the memory in an iterative solver. Also, in the original version, several external variables were not initialized when the code was called multiple times. This resulted in incorrect numerical solutions and a memory allocation error in worst cases. In the new version, external variables are converted to internal ones. The code is also tested by calling it multiple times in other solvers (results will be published soon). As a minor improvement, some of the complex number operations are simplified, and the running time is slightly reduced. Finally, a simple makefile is added in the package for the easy compile.">>make" will compile one of the test cases described in the readme.rtf file.


Table 1: Memory and CPU time comparisions between the old and new version of wFMM for real k.
     N = 7002 in a [-10, 10]2 box, tree depth = 6
   kcutoff   ErrorMemory (MB)CPU time (sec)
NewOldNewOldDirect
10-1011.71E-0736496734427840
10-513.81E-0736496135417350
0.116.08E-08364102448646860
0.222.70E-07364107248657840
0.327.76E-07363108451698820
0.431.15E-06364101348669310
0.534.19E-07364104651719310
0.631.23E-06364110551719310
0.741.34E-06363101849669310
0.841.10E-06363104748669800
0.941.78E-06363103949669800
146.00E-07363110651719310
254.80E-07363108654719310
566.64E-07363174865689800
1062.63E-073641989991039800
5062.59E-084485756114611659800


The new version of wFMM is compared with the original version in Tables 1 and 2 with the same parameters presented in the original paper. Numerical tests are conducted with a machine consisting of two quad-core Intel Xeon 3.00 GHz processors, 32 GB memory, and a gcc version 4.5.1 running on Fedora release 11. All the results in the tables can be obtained by modifying the testrun.c file. Both tables show the significant memory savings and minor CPU time reduction for both real and complex k for the number of particles N = 490,000.


Table 2: Memory and CPU time comparisions between the old and new version of wFMM for complex k.
     N = 7002 in a [-10, 10]2 box, tree depth = 6
   kcutoff   ErrorMemory (MB)CPU time (sec)
NewOldNewOldDirect
10-10+0.1i11.39E-073631041567421560
10-5+0.1i11.39E-073631063567522050
0.1+0.1i18.11E-083631070628123520
0.2+0.1i25.82E-073631067628128910
0.3+0.1i23.66E-063641034566833320
0.4+0.1i31.63E-063631042668835770
0.5+0.1i37.83E-073631064719535770
0.6+0.1i34.39E-063631021617335770
0.7+0.1i41.98E-063641065729435280
0.8+0.1i42.05E-063631065739635280
0.9+0.1i42.26E-063631034759735280
1+0.1i47.29E-0736410637810435770
2+0.1i58.66E-0736310999511935770
5+0.1i66.30E-07363176618819435770
10+0.1i63.29E-07364194847949436750
50+0.1i62.48E-0844756606653681536260



Running time:
The CPU time depends on the number of particles N and its distribution, wavenumber k, and number of cores in a machine. The computation time increases as N logN.