Estimating Multivariate Probability Density Function from Sparse Data in High Dimensional Space

in: Proc. 25th Gocad Meeting, Nancy

Abstract

This paper focuses on the estimation of multivariate probability density function (mpdf ) from sparse data in high dimensional space. Mpdf ’s are particularly important in the exploration, calibration and analysis of any subsurface data, it allows identifying key features within the data and helps to describe relationships between the variables. Usually, mpdf ’s are approximated by discrete frequency diagrams computed from sampled data but these are generally biased and not representative of the underlying population. A solution is either to fit a probability model to the diagram (parametric approach) or to directly smooth the frequencies (non-parametric approach). This paper proposes a non-parametric approach based on the discrete smooth interpolation algorithm. This approach aims to estimate, by orthogonal projection of the data point, a large number of 1D frequency diagrams along evenly distributed directions, and to build a mpdf such as its “marginal” density along all of these directions is similar to the corresponding frequency diagram. Moreover, information on the data set, such as marginal mean, covariance and quantile is taken into account during the interpolation. Results show that the proposed method (1) provides consistent results in terms of smoothness and respect of available data and (2) is still robust as the number of available data point decreases.

Download / Links

    BibTeX Reference

    @inproceedings{FetelCaumon05GM,
     abstract = { This paper focuses on the estimation of multivariate probability density function (mpdf ) from
    sparse data in high dimensional space. Mpdf ’s are particularly important in the exploration, calibration
    and analysis of any subsurface data, it allows identifying key features within the data and
    helps to describe relationships between the variables. Usually, mpdf ’s are approximated by discrete
    frequency diagrams computed from sampled data but these are generally biased and not representative
    of the underlying population. A solution is either to fit a probability model to the diagram
    (parametric approach) or to directly smooth the frequencies (non-parametric approach).
    This paper proposes a non-parametric approach based on the discrete smooth interpolation algorithm.
    This approach aims to estimate, by orthogonal projection of the data point, a large number of
    1D frequency diagrams along evenly distributed directions, and to build a mpdf such as its “marginal”
    density along all of these directions is similar to the corresponding frequency diagram. Moreover,
    information on the data set, such as marginal mean, covariance and quantile is taken into account
    during the interpolation. Results show that the proposed method (1) provides consistent results in
    terms of smoothness and respect of available data and (2) is still robust as the number of available
    data point decreases. },
     author = { Fetel, Emmanuel AND Caumon, Guillaume AND Mallet, Jean-Laurent },
     booktitle = { Proc. 25th Gocad Meeting, Nancy },
     title = { Estimating Multivariate Probability Density Function from Sparse Data in High Dimensional Space },
     year = { 2005 }
    }