Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

[Licence| Download | New Version Template] adcy_v1_0.gz(15 Kbytes)
Manuscript Title: A novel algorithm to optimize classification trees.
Authors: M. Kroger, B. Kroger
Program title: MedTree 3.1
Catalogue identifier: ADCY_v1_0
Distribution format: gz
Journal reference: Comput. Phys. Commun. 95(1996)58
Programming language: Fortran.
Computer: PC-compatible.
Operating system: DOS, Unix, Windows, X-Windows.
RAM: 1.5M words
Word size: 32
Keywords: General purpose, Classification, Tree, Statistics, Optimization, Selection rules, Utility.
Classification: 4.14.

Nature of problem:
The problem is to find best trees of classification for a specific subject to one of two groups [1]. Initially, a set of features for a (sufficient) large number of representative subjects from both groups must be sampled by the user. A good tree is expected to be found if there exist simple schemes of behaviour, or even complex correlations within the input information. The algorithm allows to take into account boundary conditions, to fit the practical purpose of the classification tree.

Solution method:
The best trees will be found by first generating 'suitable' objects of possible trees and subsequently analyzing all objects, starting with the objects in highest hierarchies. The optimal tree is calculated by following best paths along the arms of trees and by storing all relevant information, until the procedure comes to the top of the tree. On top it is finally decided, what the best tree is for a set of choosable parameters. The user is allowed to guide the algorithm in the direction of his specific goal, by setting a set of variables (see Tab. 2).

The complexity of the problem increases with number of questions to be followed (variable ifollow) and the maximum number of objects (variable setmobjects), which depend on both the input data and restrictions to practical reasons. Parameters which limit the field dimensions of the variables, are collected in Tab. 3.

Running time:
The typical running time ranges from 1 sec. to infinity. The program gives you a pre-estimation about the needed computing time at start up. If the needed time is too high to be practicable, the problem can be treated by modifying program variables.

[1] L. Breimann, J. Friedman, R.A. Olshen and C.J. Stone, Classification and regression trees (Belmont, CA, Wadsworth, 1984)