Estimating Multivariate Probability Density Function from Sparse Data in High Dimensional Space
in: Proc. 25th Gocad Meeting, Nancy
Abstract
This paper focuses on the estimation of multivariate probability density function (mpdf ) from
sparse data in high dimensional space. Mpdf ’s are particularly important in the exploration, calibration
and analysis of any subsurface data, it allows identifying key features within the data and
helps to describe relationships between the variables. Usually, mpdf ’s are approximated by discrete
frequency diagrams computed from sampled data but these are generally biased and not representative
of the underlying population. A solution is either to fit a probability model to the diagram
(parametric approach) or to directly smooth the frequencies (non-parametric approach).
This paper proposes a non-parametric approach based on the discrete smooth interpolation algorithm.
This approach aims to estimate, by orthogonal projection of the data point, a large number of
1D frequency diagrams along evenly distributed directions, and to build a mpdf such as its “marginal”
density along all of these directions is similar to the corresponding frequency diagram. Moreover,
information on the data set, such as marginal mean, covariance and quantile is taken into account
during the interpolation. Results show that the proposed method (1) provides consistent results in
terms of smoothness and respect of available data and (2) is still robust as the number of available
data point decreases.
Download / Links
BibTeX Reference
@inproceedings{FetelCaumon05GM, abstract = { This paper focuses on the estimation of multivariate probability density function (mpdf ) from sparse data in high dimensional space. Mpdf ’s are particularly important in the exploration, calibration and analysis of any subsurface data, it allows identifying key features within the data and helps to describe relationships between the variables. Usually, mpdf ’s are approximated by discrete frequency diagrams computed from sampled data but these are generally biased and not representative of the underlying population. A solution is either to fit a probability model to the diagram (parametric approach) or to directly smooth the frequencies (non-parametric approach). This paper proposes a non-parametric approach based on the discrete smooth interpolation algorithm. This approach aims to estimate, by orthogonal projection of the data point, a large number of 1D frequency diagrams along evenly distributed directions, and to build a mpdf such as its “marginal” density along all of these directions is similar to the corresponding frequency diagram. Moreover, information on the data set, such as marginal mean, covariance and quantile is taken into account during the interpolation. Results show that the proposed method (1) provides consistent results in terms of smoothness and respect of available data and (2) is still robust as the number of available data point decreases. }, author = { Fetel, Emmanuel AND Caumon, Guillaume AND Mallet, Jean-Laurent }, booktitle = { Proc. 25th Gocad Meeting, Nancy }, title = { Estimating Multivariate Probability Density Function from Sparse Data in High Dimensional Space }, year = { 2005 } }