Multiscale geometric methods for data sets I: Multiscale SVD, noise and curvature

Applied and Computational Harmonic Analysis - Tập 43 - Trang 504-567 - 2017
Anna V. Little1, Mauro Maggioni2, Lorenzo Rosasco3,4,5
1Department of Mathematics, Jacksonville University, Jacksonville, FL, USA
2Department of Mathematics, Johns Hopkins University, Baltimore, MD, USA
3Center for Brains Minds and Machines, Massachusetts Institute of Technology, Boston, MA, USA
4Laboratory for Computational and Statistical Learning, Istituto Italiano di Tecnologia, Genova, Italy
5University of Genova, Genova, Italy

Tài liệu tham khảo

Tenenbaum, 2000, A global geometric framework for nonlinear dimensionality reduction, Science, 290, 2319, 10.1126/science.290.5500.2319 Roweis, 2000, Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323, 10.1126/science.290.5500.2323 M. Belkin, P. Niyogi, Using manifold structure for partially labelled classification, Advances in NIPS 15. Donoho, 2002 Donoho, 2003, Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data, 5591 Zhang, 2002, Principal manifolds and nonlinear dimension reduction via local tangent space alignment, SIAM J. Sci. Comput., 26, 313, 10.1137/S1064827502419154 Coifman, 2005, Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps, Proc. Natl. Acad. Sci., 102, 7426, 10.1073/pnas.0500334102 Wakin, 2005, The multiscale structure of non-differentiable image manifolds Donoho, 2002 Costa, 2004, Learning intrinsic dimension and intrinsic entropy of high dimensional datasets Camastra, 2001, Intrinsic dimension estimation of data: an approach based on Grassberger–Procaccia's algorithm, Neural Process. Lett., 14, 27, 10.1023/A:1011326007550 Camastra, 2002, Estimating the intrinsic dimension of data with a fractal-based method, IEEE Trans. Pattern Anal. Mach. Intell., 24, 1404, 10.1109/TPAMI.2002.1039212 Cao, 2006, Nonlinear manifold clustering by dimensionality, 920 Rohrdanz, 2011, Determination of reaction coordinates via locally scaled diffusion map, J. Chem. Phys., 134, 10.1063/1.3569857 Zheng, 2011, Polymer reversal rate calculated via locally scaled diffusion map, J. Chem. Phys., 134, 10.1063/1.3575245 Allard, 2012, Multi-scale geometric methods for data sets II: geometric multi-resolution analysis, Appl. Comput. Harmon. Anal., 32, 435, 10.1016/j.acha.2011.08.001 Iwen, 2013, Approximation of points on low-dimensional manifolds via random linear projections, Inference Inf., 2, 1, 10.1093/imaiai/iat001 Chen, 2012, A fast multiscale framework for data in high-dimensions: measure estimation, anomaly detection, and compressive measurements, 1 Maggioni, 2013, Geometric measure estimation, 1363 Chen, 2011 Chen, 2011, Multiscale geometric and spectral analysis of plane arrangements Zhang, 2012, Hybrid linear modeling via local best-fit flats, J. Comput. Vis., 100, 217, 10.1007/s11263-012-0535-6 Lafon, 2004 Coifman, 2006, Diffusion maps, Appl. Comput. Harmon. Anal., 21, 5, 10.1016/j.acha.2006.04.006 Crosskey, 2016, Atlas: a geometric approach to learning high-dimensional stochastic systems near manifolds, Multiscale Model. Simul. Muldoon, 1993, Topolgy from time series, Phys. D, 65, 1, 10.1016/0167-2789(92)00026-U Broomhead, 1991, Local adaptive Galerkin bases for large dimensional dynamical systems, Nonlinearity, 4, 159, 10.1088/0951-7715/4/2/001 Farmer, 1987, Predicting chaotic time series, Phys. Rev. Lett., 59, 845, 10.1103/PhysRevLett.59.845 Jones, 1991, The traveling salesman problem and harmonic analysis, Publ. Mat., 35, 259, 10.5565/PUBLMAT_35191_12 G. David, S. Semmes, Uniform Rectifiability and Quasiminimizing Sets of Arbitrary Codimension, AMS. David, 1991 Little, 2009, Multiscale estimation of intrinsic dimensionality of data sets Little, 2009, Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale SVD Chen, 2011, Multi-resolution geometric analysis for data in high dimensions A.V. Little, Estimating the Intrinsic Dimension of High-Dimensional Data Sets: A Multiscale, Geometric Approach, April 2011. Jones, 1990, Rectifiable sets and the traveling salesman problem, Invent. Math., 102, 1, 10.1007/BF01233418 G. David, J. Journé, A boundedness criterion for generalized Calderón–Zygmund operators, Annals of Mathematics. David, 1993, Analysis of and on Uniformly Rectifiable Sets, vol. 38 Schul Rudelson, 1999, Random vectors in the isotropic position, J. Funct. Anal., 164, 60, 10.1006/jfan.1998.3384 R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Aug. 2010. Fukunaga, 1976, An algorithm for finding intrinsic dimensionality of data, IEEE Trans. Comput., 20, 165 Bruske, 1998, Intrinsic dimensionality estimation with optimally topology preserving maps, IEEE Trans. Comput., 20, 572 Hundley, 2003, Estimation of topological dimension, 194 Kirby, 2000 P.J. Verveer, R.P. Duin, An evaluation of intrinsic dimensionality estimators, IEEE Trans. Pattern Anal. Mach. Intell. 17 (1). E. Levina, P. Bickel, Maximum likelihood estimation of intrinsic dimension, in: Advances in NIPS 17, Vancouver, Canada. Haro, 2008, Translated Poisson mixture model for stratification learning, Int. J. Comput. Vis., 80, 358, 10.1007/s11263-008-0144-6 Carter, 2008, Variance reduction with neighborhood smoothing for local intrinsic dimension estimation, 3917 Carter, 2007, De-biasing for intrinsic dimension estimation, 601 Costa, 2004, Geodesic entropic graphs for dimension and entropy estimation in manifold learning, IEEE Trans. Signal Process., 52, 2210, 10.1109/TSP.2004.831130 Raginsky, 2005, Estimation of intrinsic dimensionality using high-rate vector quantization, 1105 Takens, 1985, On the numerical determination of the dimension of an attractor, vol. 1125, 99 Hein, 2005, Intrinsic dimensionality estimation of submanifolds in Euclidean space, 289 Borovkova, 1999, Consistency of the Takens estimator for the correlation dimension, Ann. Appl. Probab., 9, 376, 10.1214/aoap/1029962747 Grassberger, 1983, Measuring the strangeness of strange attractors, Phys. D, 9, 189, 10.1016/0167-2789(83)90298-1 A.M. Farahmand, C.S.J.-Y. Audibert, Manifold-adaptive dimension estimation, Proc. I.C.M.L. Broomhead, 1987, Topological dimension and local coordinates from time series data, J. Phys. A: Math. Gen., 20, L563, 10.1088/0305-4470/20/9/003 Broomhead, 1991, Local adaptive Galerkin bases for large-dimensional dynamical systems, Nonlinearity, 4, 159, 10.1088/0951-7715/4/2/001 Lee, 1997 Har-Peled, 2006, Fast construction of nets in low-dimensional metrics and their applications, SIAM J. Comput., 35, 1148, 10.1137/S0097539704446281 Beygelzimer, 2006, Cover trees for nearest neighbor, 97 Rokhlin, 2009, A randomized algorithm for principal component analysis, SIAM J. Matrix Anal. Appl., 31, 1100, 10.1137/080736417 Haro, 2008, Translated Poisson mixture model for stratification learning, Int. J. Comput. Vis., 80, 358, 10.1007/s11263-008-0144-6 Levina, 2005, Maximum likelihood estimation of intrinsic dimension, vol. 17, 777 Costa, 2004, Geodesic entropic graphs for dimension and entropy estimation in manifold learning, IEEE Trans. Signal Process., 52, 2210, 10.1109/TSP.2004.831130 Carter, 2008, Variance reduction with neighborhood smoothing for local intrinsic dimension estimation, 3917 M. Chen, J. Silva, J. Paisley, C. Wang, D. Dunson, L. Carin, Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds, IEEE Trans. Signal Process. H. Chen, J. Silva, D. Dunson, L. Carin, Hierarchical bayesian embeddings for analysis and synthesis of dynamic data, submitted for publication. Kegl, 2002, Intrinsic dimension estimation using packing numbers, 681 Fan, 2009, Intrinsic dimension estimation of manifolds by incising balls, Pattern Recognit., 42, 780, 10.1016/j.patcog.2008.09.016 Johnson, 1984, Extension of Lipschitz maps into a Hilbert space, Contemp. Math., 26, 189, 10.1090/conm/026/737400 R. Baraniuk, M. Wakin, Random projections of smooth manifolds, preprint. Jones, 2008, Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels, Proc. Natl. Acad. Sci., 105, 1803, 10.1073/pnas.0710175104 Jones, 2010, Universal local manifold parametrizations via heat kernels and eigenfunctions of the Laplacian, Ann. Acad. Scient. Fenn., 35, 1 Singer, 2009, Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps, Proc. Natl. Acad. Sci., 106, 16090, 10.1073/pnas.0905547106 R. Vershynin, How close is the sample covariance matrix to the actual covariance matrix? Submitted for publication. Mandelbrot, 2004 Jones, 1990, Rectifiable sets and the traveling salesman problem, Invent. Math., 102, 1, 10.1007/BF01233418 Verma, 2009, Which spatial partition trees are adaptive to intrinsic dimension?, 565 Johnstone, 2001, On the distribution of the largest eigenvalue in principal components analysis, Ann. Statist., 29, 295, 10.1214/aos/1009210544 Baik, 2006, Eigenvalues of large sample covariance matrices of spiked population models, J. Multivariate Anal., 97, 1382, 10.1016/j.jmva.2005.08.003 Silverstein, 2007, On the empirical distribution of eigenvalues of large dimensional information-plus-noise type matrices, J. Multivariate Anal., 98, 678, 10.1016/j.jmva.2006.09.006 Koltchinskii, 2000, Empirical geometry of multivariate data: a deconvolution approach, Ann. Statist., 28, 591, 10.1214/aos/1016218232 Paul, 2007, Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statist. Sinica, 17, 1617 Nadler, 2008, Finite sample approximation results for principal component analysis: a matrix perturbation approach, Ann. Statist., 36, 2791, 10.1214/08-AOS618 D.N. Kaslovsky, F.G. Meyer, Optimal Tangent Plane Recovery From Noisy Manifold Samples, ArXiv e-prints. Chernoff, 1952, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Statist., 23, 493, 10.1214/aoms/1177729330 Niyogi, 2008, Finding the homology of submanifolds with high confidence from random samples, Discrete Comput. Geom., 39, 419, 10.1007/s00454-008-9053-2 Barvinok Wielandt, 1967 Pinelis, 1992, An approach to inequalities for the distributions of infinite-dimensional martingales, 128 Pinelis, 1994, Optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab., 22, 1679, 10.1214/aop/1176988477 Buldygin, 2000 Rudelson, 2009, The smallest singular value of a random rectangular matrix, Comm. Pure Appl. Math., 1707, 10.1002/cpa.20294