Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

[Licence| Download | New Version Template] aepk_v1_0.tar.gz(152 Kbytes)
Manuscript Title: Penalized Splines for Smooth Representation of High-dimensional Monte Carlo Datasets
Authors: Nathan Whitehorn, Jakob van Santen, Sven Lafebre
Program title: Photospline
Catalogue identifier: AEPK_v1_0
Distribution format: tar.gz
Journal reference: Comput. Phys. Commun. 184(2013)2214
Programming language: C, Python.
Computer: 32- and 64-bit x86, 32- and 64-bit PowerPC.
Operating system: Linux, Mac OS X, FreeBSD.
Has the code been vectorised or parallelized?: Both
RAM: Approximately proportional to number of knots used in fitting, depends on problem condition
Keywords: Splines, Monte Carlo, Histograms, Maximum Likelihood.
PACS: 02.60.Ed.
Classification: 4.9.

External routines: SuiteSparse (http://www.cise.ufl.edu/research/sparse/SuiteSparse/), Python (http://www.python.org/), BLAS (http://www.netlib.org/blas/), Numpy (http://www.numpy.org/)

Nature of problem:
An algorithm to smoothly represent histogram, including mathematical operations and convolutions. Using histograms of Monte Carlo simulation for likelihood fitting can be unstable due to binning artifacts from statistical fluctuations and hard bin-to-bin transitions. This package provides a toolkit for using penalized spline fits on extremely large multi-dimensional datasets to reduce or eliminate such issues.

Solution method:
Uses sparse matrix operations, non-negative least-squares fitting, and generalized linear array models in conjunction with a number of other algorithms to allow fits to be made, manipulated, and saved with very low computational requirements. This enables even very large problems to be solved on commercially available machines.

Restrictions:
Required computation time and memory increases very rapidly with the number of dimensions. Fits without stacking involving more than 5 dimensions and 20 knots on each are usually not practical on 2012-era hardware.

Running time:
Roughly proportional to the cube of the number of knots used, depends strongly on conditioning of the problem.