Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

[Licence| Download | New Version Template] adcy_v2_0.gz(18 Kbytes)
Manuscript Title: Optimization of classification trees: strategy and algorithm improvement.
Authors: M. Kroger
Program title: MedTree 4.1
Catalogue identifier: ADCY_v2_0
Distribution format: gz
Journal reference: Comput. Phys. Commun. 99(1996)81
Programming language: Fortran.
Computer: PC-compatible.
Operating system: DOS, Unix, Windows, X-Windows.
RAM: 1.5M words
Word size: 32
Keywords: General purpose, Utility, Tree, Statistics, Optimization, Selection rules.
Classification: 4.14.

Nature of problem:
The problem is to find best trees of classification for a specific subject to one of two groups [1,2]. Initially, a set of features for a (sufficient) large number of representative subjects from both groups must be sampled by the user. A good tree is expected to be found if there exist schemes of behaviour, or even complex correlations within the input information. The algorithm allows to take into account boundary conditions, to fit the practical purpose of the classification tree.

Solution method:
See reference in CPC to previous version. The method of solution can be affected in direction of specific goals by setting variables (see Tab. 2).

Reasons for new version:
The reasons are to decrease the large field dimensions in order to account for more temporary trees at a given hardware; to be able to analyze incomplete data; to produce especially short trees with strong coupling to the given closing conditions; to simplify the adjustment and introduce the possibility to view the effect of selection mechanisms.

Restrictions:
See reference in CPC to previous version. New rejection mechanisms reduce the number of objects and thus allow to calculate trees with higher quality. Parameters which limit the field dimensions of the variables are collected in Tab. 3 of Ref [1].

Unusual features:
The summary of revisions is marked by the term '!REV4.1' within the source code.

Running time:
The typical running time is one order of magnitude lower than the computing time of the original version. The program gives you a preestimation about the needed computing time at start up. If the needed time is too high to be practicable, the problem can be treated by modifying program variables.

References:
[1] M. Kroger and B. Kroger, Comp. Phys. Commun. 95 (1996) 58-72.
[2] L. Breimann, J. Friedman, R.A. Olshen and C.J. Stone, Classification and regression trees (Belmont, CA, Wadsworth, 1984).