Programs in Physics & Physical Chemistry
|[Licence| Download | New Version Template] abjr_v1_0.gz(8 Kbytes)|
|Manuscript Title: Balanced binary tree code for scientific applications.|
|Authors: S.C. Park, J.P. Draayer|
|Program title: BBTREE|
|Catalogue identifier: ABJR_v1_0|
Distribution format: gz
|Journal reference: Comput. Phys. Commun. 55(1989)189|
|Programming language: Fortran.|
|Computer: IBM 3090/600E, DEC VAX-11/750, IBM-PC/XT.|
|Operating system: MVS/XA 311, VMS 4.5, DOS 3.2.|
|RAM: 6320K words|
|Keywords: General purpose, Utility, Database, Binary tree, Dynamic storage, AVL tree.|
|Classification: 4.14, 9.|
Nature of problem:
Typical scientific programming applications require numerous calls to one or more subroutines. The intermediate results generated by these calls are usually not saved; if the same information is required at a later stage it is simply recalculated. While wasteful of cpu power, this modus operandi is attractive because it spares the user the time and effort associated with the development of complicated data storage and retrieval algorithms. However, if the number of redundant calls to a particular subroutine is large and the time spent in that subprogram amounts to a sizable fraction of the total cpu time, this simplistic programming style must yield to more sophisticated practices that allow for better utilization of the available machine resoures.
This scenario provided motivation for the development of BBTREE which is a binary tree scheme for scientific applications. Specifically, the objective was to develop a set of low-overhead and easy-to-use routines that allow one to build data structures with entries that can be reused and/or dumped as necessary to optimize program execution times against machine memory capacity. The following were considered essential features:
1) Simple - the product should be integrable into existing codes with a minimum amount of effort on the part of the user. 2) Efficient - the routines should require little storage and apply without change to both simple and complex data structures. 3) Fast - fetch and insert operations must be binary in nature so times go as 0[log(n)] where n is the number of items in the list. 4) Passive - program control should be set so reinitializations are unnecessary as one moves form one data set to another. 5) Tunable - the user should be able to fine-tune the application to optimize utilization of the available machine resoures.These features are all realized in the BBTREE package. In particular, one can have any number of different data structures in an application, all using the same binary tree routines for fetch and insert operations.
A height-balanced binary tree solves the problem. It gives 0[log(n)] times for search and insert operations on dynamically generated lists. A full discussion of the theory behind the binary tree concept, commonly referred to as AVL-tree logic after Adel'son-Vel'skiy and Landis who first proposed it, can be found in the book, "Fundamentals of Data Structures" by Horowitz and Sahni. The BBTREE package is a user friendly FORTRAN implementation of the AVL-tree concept. In the package the tree is a linear integer array, ID(-9:*), which consists of ten elements, ID(-9) to ID(0) that specify the structure of the tree, and node information starting with ID(1). Each node includes a key, data, a left-child (LC) and a right-child (RC) pointer, and a balance factor factor (BF). The two pointers and the balance factor are bookkeeping elements that are required for tree traversal. All the linkage operations are handled internally and are transparent to users of the package.
A typical application involves storing and organizing output from a frequently called and cpu-intensive subprogram so later recovery of the information is quick and easy. For this to be done, the parameters of the subroutine must be combined in such a way as to form a unique (integer) identification scheme. This label serves as the key of the node in the tree where either the results of the calculation are kept as data entries (fixed length integer output) or where the node data points to a location and gives the number of elements in another array (for example, variable length noninteger output) where the results for the case in point can be found. This scheme is a trick that is used so the routines in the package can be generic and the space allocated to each node can be the results of a calculation, they usually are used instead to give, for example, the starting index and number of entries in another array (integer or real) where the desired results are actually stored.
The search operation on a tree has three possible outcomes: 1) the key is found indicating the results are available either from the tree as integer data or from external arrays at positions designated by the data; 2) the key is not found, in which case a new node is added to the tree, the desired result is calculated, and the data (actual results or array pointers) is loaded into the new node; 3) the key is not found and adding another node to the tree will cause the tree array to overflow. In the latter case the user is given the option of stopping the calculation or reinitializing the array and continuing. If the array is reinitialized, all previous information stored in the tree is lost. While it is highly desirable to eliminate as much redundancy as possible, this reinitialization option is the feature that allows the user to balance cpu utilization against machine memory capacity. So the routines in the BBTREE package can be used on data structures of fixed size as well as dynamically generated ones, the search routine includes an option which suppresses the automatic addition of a node when the key is not found in the tree.
There are no software restrictions. The routines in the BBTREE package are generic and act passively on the tree array. Since each tree array is dimensioned in the calling program, the numbers quoted above under "high speed storage required" are a true measure of space overhead associated with the use of the package. The maximum number of nodes a tree can have is set by the user. If the pointers are n bit integer words, the theoretical limit for this number is simply 2n -1. In actual practice, however, the number will be much less than this. The number of nodes is related to the dimensioned size of the tree array in the following way: (number of nodes) = (dimensioned size of the tree array less ten for the structure parameters) / (sum of the number words dedicated to the key and data, plus three for the bookkeeping information).
|Disclaimer | ScienceDirect | CPC Journal | CPC | QUB|