Programs in Physics & Physical Chemistry
|[Licence| Download | New Version Template] acew_v1_0.gz(24 Kbytes)|
|Manuscript Title: ILUCG3: subprograms for the solution of a linear asymmetric matrix equation arising from a 7, 15, 19 or 27 point 3D discretization. See erratum Comp. Phys. Commun. 31(1984)437.|
|Authors: D.V. Anderson|
|Program title: ILUCG3|
|Catalogue identifier: ACEW_v1_0|
Distribution format: gz
|Journal reference: Comput. Phys. Commun. 30(1983)43|
|Programming language: Fortran.|
|Operating system: CTSS.|
|Word size: 64|
|Keywords: General purpose, Partial differential Equations, Elliptic, Parabolic, Three-dimensional, Implicit, Pre-conditioned, Conjugate gradient Algorithm, Iccg, Ilucg, Asymmetric matrix, Non-symmetric Matrix.|
Nature of problem:
Elliptic and parabolic partial differential equations which arise in plasma physics applications (as well as in others) are solved in three dimensions. Plasma diffusion, equilibria, and phase space transport (Fokker-Planck equation) have been treated by similar methods in two dimensions using the codes ICCG2 and ILUCG2. These problems share the common feature of being stiff and requiring implicit solution techniques. Generally, the resulting matrix equations are asymmetric; we solve them here with the ILUCG3 program. In a subsequent article we describe a simpler and faster algorithm, ICCG3, which should be used when the matrix is symmetric.
A generalization of the incomplete Cholesky conjugate gradient (ICCG) algorithm is used to solve the linear asymmetric matrix equation.
The discretization of the three-dimensional PDE and its boundary conditions must result in a spatial operator with a 7, 15, 19 or 27 point stencil which can be represented by a block tridiagonal supermatrix composed of block-tridiagonal submatrices whose elements are elementary tridiagonal matrices. The resulting sparse matrix has 7, 15, 19 or 27 non-zero diagonals.
The loops are arranged to vectorize on the Cray-1 (with the CFT compiler) wherever possible. Recursive loops (which cannot be vectorized) are written for optimum scalar speed. Some of these loops were made more efficient by increasing the arithmetic per pass and at the same time reducing the number of passes. This trick significantly reduced loop overhead and made these scalar loops 30% faster. One of the sources is not in standard FORTRAN; we show how to make it standard.
These are problem dependent because ill-conditioned matrices will require more iterations than well conditioned ones. However, the running times per iteration are known. For enough equations (1287 here) the running times per iteration per equation become independent of the number of equations. Also, the running times depend on which of the 16 versions we are using. In the test problems, which we regard as typical, the time for the incomplete Lu factorization ranged from 4.4 to 34.0 mu s per unknown for the versions that employed 7-banded and 27- banded LU sparsity patterns, respectively. Each conjugate gradient iteration required from 3.5 to 21.2 mu s per unknown; these times depend strongly on the sparsity pattern of LU and weakly on that of M. In the test problems the number of iterations required to reach a relative residual error of 10**-10 (in the L2 norm) ranged from 8 to 34.
|Disclaimer | ScienceDirect | CPC Journal | CPC | QUB|