Generalized Mahalanobis depth in the reproducing kernel Hilbert space

Statistische Hefte - Tập 52 - Trang 511-522 - 2009
Yonggang Hu1, Yong Wang1, Yi Wu1, Qiang Li1, Chenping Hou1
1Department of Mathematics and Systems Science, National University of Defense Technology, Changsha, People’s Republic of China

Tóm tắt

In this paper, Mahalanobis depth (MHD) in the Reproducing Kernel Hilbert Space (RKHS) is proposed. First, we extend the notion of MHD to a generalized version, i.e., the generalized MHD (GHMD), to make it suitable for the small sample with singular covariance matrix. We prove that GMHD is consistent with MHD when the sample has a full-rank covariance matrix. Second, we further extend GMHD to RKHS, i.e, the kernel mapped GMHD (kmGMHD), and discuss its main properties. Numeric results show that kmGMHD can give a better depth interpretation for the sample with special shape, such as a non-convex sample set. Our proposed kmGMHD can be potentially used as a robust tool for outliers detection and data classification. In addition, we also discuss the influence of parameters on the shape of the central regions.

Tài liệu tham khảo

Abe S (2005) Support vector machines for pattern classification, 1st edn. Springer-Verlag London Limited, Reading, pp 32–33 subsection 2.3 Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2005) Svm and kernel methods matlab toolbox. Perception Systèmes et Information, INSA de Rouen, Rouen, France Chen Z, Tyler DE (2004) On the finite sample breakdown points of redescending m-estimates of location. Stat Probab Lett 69: 233–242 Chenouri S (2004) Multivariate robust nonparametric inference based on data depth. Univ Waterloo 90: 67–89 Cui X, Lin L, Yang GR (2008) An extended projection data depth and its applications to discrimination. Commun Stat Theory Methods 37(14): 2276–2290 Donoho DL (1982) Breakdown properties of multivariate location estimators. PhD thesis, Deptment of Statistics, Harvard University Gao Y (2003) Data depth based on spatial rank. Stat Probab Lett 65: 217–225 Ghosh AK, Chaudhuri P (2005a) On data depth and distribution-free discriminant analysis using separating surfaces. Bernoulli 11: 1–27 Ghosh AK, Chaudhuri P (2005b) On maximum depth and related classifiers. Board Found Scand Stat 32: 327–350 Hofmann T, SchöLkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3): 1171–1220 Jörnsten R (2004) Clustering and classification based on the l 1 data depth. J Multivar Anal 90: 67–89 Li G, Zhang J (1998) Sphering and its properties. Sankhyā Indian J Stat 60(Series A, Pt. 1): 119–133 Liu R, Singh K (1993) A quality index based on data depthand multivariate rank tests. J Am Stat Assoc 88: 252–260 Liu R, Singh K (1997) Notionso of limiting p values based on data depth and bootstrap. J Am Stat Assoc 92: 266–277 Liu RY (1990) On a notion of data depth based on random simplices. Ann Stat 18: 405–414 Liu RY (1992) Data depth and multivariate rank tests. In: Dodge Y (eds) L 1-Statistics and related methods. North-Holland, Amsterdam, pp 279–294 Mahalaobis PC (1936) On the generalized distance in statistics. Proc Natl Acad Sci India 12: 49–55 Pennacchi P (2008) Robust estimate of excitations in mechanical systems using m-estimators. J Sound Vib 310: 923–946 Stahel WA (1981) Robuste schatzungen: Infintesimale optimalitat and schatzungen von kovarianzmatrizen. PhD thesis, Zurich Taylor JS, Cristianini N (2004) Kernel methods for pattern analysis, 1st edn. Cambridge University Press, Reading Tian X, Vardi Y, Zhang C (2002) l 1−depth, depth realtive to a model, and robust regression. In: Dodge Y (eds) In Statistical data analysis based on the L 1−norm and Related Methods. Birkhauser, Basel, pp 285–299 Tukey JW (1975) Mathematics and picturing of data. In: James RD (eds) Proceedings of the international congress on mathematics, vol 2. Canadian Mathematics Congress, Vancouver, pp 523–531 Vardi Y, Zhang C (2000) The multivariate l 1-median and associated data depth. In: Proceedings of National Academy of Science, vol 97, pp 1423–1426 Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28(2): 461–482