Application of Clustering in the Non-Parametric Estimation of Distribution Density
T. Ruzgas
Institute of Mathematics and Informatics, Lithuania
R. Rudzkis
Institute of Mathematics and Informatics, Lithuania
M. Kavaliauskas
Kaunas University of Technology, Lithuania
Published 2006-11-01


non-parametric estimation
multivariate density function
sample clustering
projection pursuit
Monte-Carlo method

How to Cite

Ruzgas, T., Rudzkis, R. and Kavaliauskas, M. (trans.) (2006) “Application of Clustering in the Non-Parametric Estimation of Distribution Density”, Nonlinear Analysis: Modelling and Control, 11(4), pp. 393–411. doi:10.15388/NA.2006.11.4.14741.


This paper discusses a multimodal density function estimation problem of a random vector. A comparative accuracy analysis of some popular non-parametric estimators is made by using the Monte-Carlo method. The paper demonstrates that the estimation quality increases significantly if the sample is clustered (i.e., the multimodal density function is approximated by a mixture of unimodal densities), and later on, the density estimation methods are applied separately to each cluster. In this paper, the sample is clustered using the Gaussian distribution mixture model and the EM algorithm. The highest efficiency in the analysed cases was reached by using the iterative procedure proposed by Friedman for estimating a density component corresponding to each cluster after the primary sample clustering mentioned. The Friedman procedure is based on both the projection pursuit of multivariate observations and transformation of the univariate projections into the standard Gaussian random values (using the density function estimates of these projections).