Programs in Physics & Physical Chemistry
|[Licence| Download | New Version Template] adul_v2_0.tar.gz(178 Kbytes)|
|Manuscript Title: ARVO-CL: The OpenCL Version of the ARVO Package - An Efficient Tool for Computing the Accessible Surface Area and the Excluded Volume of Proteins via Analytical Equations|
|Authors: Ján Buša Jr., Shura Hayryan, Ming-Chya Wu, Ján Buša, Chin-Kun Hu|
|Program title: ARVO-CL|
|Catalogue identifier: ADUL_v2_0|
Distribution format: tar.gz
|Journal reference: Comput. Phys. Commun. 183(2012)2494|
|Programming language: C, OpenCL.|
|Computer: PC Pentium; SPP'2000.|
|Operating system: All OpenCL capable systems.|
|Has the code been vectorised or parallelized?: Parallelized using GPUs. A serial version (non GPU) is also included in the package.|
|Keywords: ARVO, Proteins, Solvent accessible area, Excluded volume, Stereographic projection, OpenCL package.|
|PACS: 87.14.Ee, 87.15.Aa, 02.60.Jh, 05.10.-a.|
External routines: cl.hpp (http://www.khronos.org/registry/cl/api/1.1/cl.hpp)
Does the new version supersede the previous version?: Yes
Nature of problem:
Molecular mechanics computations, continuum percolation
Numerical algorithm based on the analytical formulas, after using the stereographic transformation.
Reasons for new version:
During past decade we have published a number of protein structure related algorithms and software packages [1, 2, 3, 4, 5, 6] which have received considerable attention from researchers and interesting applications of such packages have been found. For example, ARVO  has been used to find that ratios of volume V to surface area A. for proteins in Protein Data Bank (PDB) distribute in a narrow range . Such a result is useful for finding native structures of proteins.
Therefore, we consider that there is a demand to revise and modernize these tools and to make them more efficient. Here we present the new version of the ARVO package. The original ARVO package was written in the FORTRAN language. One of the reasons for the new version is to rewrite it in C in order to make it more friendly to the young researchers who are not familiar with FORTRAN. Another, more important reason, is to use the possibilities for speeding-up provided by modern graphical cards. We also want to eliminate the necessity of re-compiling the program for every molecule. For this purpose, we have added the possibility of using general pdb  files as an input. Once compiled, the program can receive any number of input files successively. Also, we found it necessary to go through the algorithm and to make some tricks for avoiding unnecessary memory usage so that the package is more efficient.
Summary of revisions:
Improvements as compared with the original version:
Support for files in the format as created by 'input structure'; input of parameters (name of input file) via command line; dynamic size of arrays - removal of the necessity to re-compile the program after any change in size of structures; memory allocation according to the real demands of the application; replacing north pole test by radius size slight reduction (see below).
To compile an OpenCL program, one needs to download and install the appropriate driver and software development kit (SDK). The program itself consists of two parts: a part running on the CPU and a part running on the GPU. The CPU initializes communication between the computer and the GPU loads data processes and exports results. The GPU does the parallel part of calculation, consisting of the search for neighboring atoms and calculating the contribution of the area and volume of the individual atom to the total area and volume of the molecule. For details of the algorithm, please read references [3, 4].
In programming using OpenCL, more attention must be given to memory used than in a classical approach. Memory of the device is usually limited and therefore, some changes to the original algorithm are necessary. First, unlike in the FORTRAN version of the program, no structures containing the list of neighbor atoms are created. The search for the neighbors is done on-line, when the calculation of the contribution from individual atoms is being performed.
The idea behind the North Pole check and molecule rotation [4, Sec. 4.7] has been changed. If during the north pole test, the north pole of the active sphere lies close to the surface of a neighboring sphere, the radius of such a neighboring sphere is multiplied by 0.9999 instead of rotating the whole molecule. This allows the algorithm to continue normally. Changing the radius of one atom changes the area and the volume of this atom by 0.02% and 0.03%, respectively. As the atom's contribution to the total area (volume) of the protein is usually only a part of the atom's total area (volume) and since there are many atoms in the protein itself, the change of total area (volume) is much smaller than 0.02% (0.03%). Testings showed relative errors ranging from 10-4 up to 10-8. An additional benefit of this approach is, that the whole molecule is not rotated and therefore there no errors are introduced which would occur during such rotation. We were even able to find a protein (1S1I having 31938 atoms), where, after several hundreds of rotations, ARVO was not able to find such a position that the original north pole test could pass. For such proteins the new approach is the only one possible.
Some data obtained using the north pole test (with rotation) and those without the north pole test (with radii reduction) are summarized in Table 1. The radius of water molecule was set to 1.4Å, and Rashin's set of the van der Waals radii of atoms  was used. The first column contains the protein name and the number of atoms. Each cell of the second and the third columns contains two numbers. The upper number is the volume (surface area) obtained using original ARVO algorithm () with conventional north pole test and rotation. The lower number shows the difference coming from using the new approach. The upper number in the fourth column shows the number of rotations when using the original version and the second number is the number of atoms for which the radius has been reduced. The relative error of volume (upper number) and area (lower number) obtained by using radius reduction are shown in the last column. It can be seen clearly, that the error is negligible.
Table 1: Comparison of volumes and surface areas of different proteins obtained by original ARVO and by the new version. Different strategies for dealing with the "north pole" are applied. The first column contains the PDB ID of the protein and the number of atoms. Second column: the volume of the protein obtained with original ARVO (upper number) and the difference with the new approach (lower number). Third column: the same as in 2nd column for the surface area. Forth column: The number of rotations of the molecule in original ARVO (upper number) and the number atoms whose radius has been reduced in the new version (lower number). Fifth column: The relative errors for the volume (upper number) and the area (lower number).
The disadvantage is that calculations using OpenCL are done in single precision only. This comes from the fact, that the OpenCL standard doesn't contain double precision float number operations as a basic part but as an extension only. This means that availability of double precision calculations depends on the device (CPU, GPU) vendor. Switching to double precision calculations downgrades speed performance (calculations in double precision are 8 to 2 times slower than the same calculations in single precision). Another problem is that after using the double precision switch, all calculations are done in double precision which leads to problems with insufficient memory. This problem can be bypassed by explicitly switching to single precision where possible but it requires careful modification of the whole program source. Since on our GPU (NVIDIA GTX 480) double precision was available, we have decided to use the double precision only for the critical parts of algorithm (s.a. integral calculation), leaving non-critical parts in single precision. This allowed us to speed up the calculation and to obtain acceptable results.
Results of the test calculations are given in Table 2. All calculations except for 2brd0 have been performed using water radius 1.4Å. The first column contains the protein name and the number of atoms. The second column contains computation time in seconds (in FORTRAN/CPU - upper part and OpenCL/GPU - lower part). The third column is a speed-up (time on the CPU divided by time on the GPU). The fourth and fifth columns contain the volume and area calculated in FORTRAN (upper number) and the difference when compared to results obtained by OpenCL (lower number). As one can see, the area and the volume obtained using FORTRAN (in double precision) and the OpenCL implementation (combination of single and double precision) are practically the same. This is even more clear from the relative error of the OpenCL implementation as shown in the last column (upper number for volume, and lower number for area). As to computational time, FORTRAN (C) implementation is appropriate in the case when the calculation takes approximately less than 2 seconds. This is because in the case of OpenCL some time - about 0.3s-1.5s on testing configuration - is needed for the initialization of the device and for starting the communication. Speed-up is clearly visible for large proteins when the parallel approach can be exploited, but complexity of protein needs to be taken into account as well. Compare the times for 2brd (water radius 1.4Å) and 2brd0 (water radius 0Å). The difference is in the number of neighbors (overlapping spheres). While, for water radius 1.4Å the number of neighbors is high and using the GPU is efficient, for water radius 0Å it is better to use CPU. All results were obtained on a test configuration with CPU Intel Core i7 930 processor running at 2.8GHz and a GPU NVIDIA GeForce GTX 480.
Table 2: The table contains comparative data on precision and computational times obtained by FORTRAN vs. OpenCL implementations of ARVO. The structure of the columns is similar to the Table 1. Note that last protein (1s1i) was not calculated using FORTRAN implementation and comparison presented is between C and OpenCL version. This is because we were not able to find such rotation, that north pole test would pass.
At the time of writing, OpenCL allowed the allocation of only 1/4 of the total memory of the devices (CPU, GPU) by one call to malloc. This can be bypassed by four individual calls of memory allocation requesting 1/4 of the total devices' memory. It is advisable to use a dedicated GPU for the calculations since sharing a GPU for calculations and displaying graphics can lead to unexpected results due to common access to the memory of devices.
The program does not account for possible cavities inside the molecule. The current version works in a combination of single and double precision (see Summary of revisions for details).
Depends on the size of the molecule under consideration. For molecules whose running time was less than 2 seconds in the old version the performance is likely to decrease. This changes considerably when larger molecules are calculated (in test configuration speed-ups up to 34 were obtained).
|||F. Eisenmenger, U. H. E. Hansmann, S. Hayryan, C.-K. Hu, Comput. Phys. Commun. 138 (2001) 192.|
|||F. Eisenmenger, U. H. E. Hansmann, S. Hayryan, C.-K. Hu, Comput. Phys. Commun. 174 (2006) 422.|
|||S. Hayryan, C.-K. Hu, J. Skrivánek, E. Hayryan, I. Pokorný, J. Comput. Chem. 26 (2005) 334.|
|||J. Busa, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorný, J. Skrivánek, M.-C. Wu, Comput. Phys. Commun. 165 (2005) 59.|
|||J. Busa, S. Hayryan, C.-K. Hu, J. Skrivánek, M.-C. Wu, J. Comput. Chem. 30 (2009) 346.|
|||J. Busa, S. Hayryan, C.-K. Hu, J. Skrivánek, M.-C. Wu, Comput. Phys. Commun. 181 (2010) 2116.|
|||M.-C. Wu, M. S. Li, W.-J. Ma, M. Kouza, C.-K. Hu, EPL 96 (2011) 68005.|
|||B. Lee, F. M. Richards, J. Mol. Biol. 55 (1971) 379.|
|||F. M. Richards, Annu. Rev. Bipohys. Bioeng. 6 (1977) 151.|
|||A. Shrake, J. A. Rupley, J. Mol. Biol. 79 (1973) 351.|
|||A. A. Rashin, M. Iofin, B. Honig, Biochemistry 25 (1986) 3619.|
|||C. Chotia, Nature 248 (1974) 338.|
|||http://www.nvidia.com/object/cuda home new.html|
|Disclaimer | ScienceDirect | CPC Journal | CPC | QUB|