Principal component analysis: a review and recent developments

Ian T. Jolliffe1, Jorge Cadima2
1College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK
2Secção de Matemática (DCEB), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa 1340-017, Portugal

Tóm tắt

Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori , hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

Từ khóa


Tài liệu tham khảo

10.1080/14786440109462720

10.1037/h0071325

10.1002/0471725331

Jolliffe IT, 2002, Principal component analysis, 2

Diamantaras KI, 1996, Principal component neural networks: theory and applications

Flury B, 1988, Common principal components and related models

10.1017/CBO9780511810817

10.1007/BF02481011

Okamoto M. 1969 Optimality of principal components. In Multivariate analysis II (ed. PR Krishnaiah) pp. 673–685. New York NY: Academic Press.

10.1080/00401706.1984.10487939

10.1016/j.csda.2003.11.001

10.1038/nature13622

R Development Core Team. 2015 R: A language and environment for statistical computing . Vienna Austria: R Foundation for Statistical Computing. See http://www.R-project.org.

10.1093/biomet/58.3.453

Cadima J, 2009, On relationships between uncentred and column-centred principal component analysis, Pak. J. Stat., 25, 473

10.1038/nbt0308-303

10.1186/1471-2105-11-296

10.1214/12-AOS1014

10.1198/jasa.2009.0121

Ramsay JO, 2006, Functional data analysis, 2

10.2307/2527726

10.1080/01621459.2013.788980

10.1080/01621459.2014.946991

10.1111/rssb.12076

Brillinger DR, 1981, Time series: data analysis and theory

10.1007/978-1-4684-2262-7

10.1198/1061860032148

10.1201/b18401

10.1198/106186006X113430

10.1093/biostatistics/kxp008

10.1137/050645506

10.1214/14-AOS1273

Obukhov AM, 1947, Statistically homogeneous fields on a sphere, Usp. Mat. Navk., 2, 196

Lorenz EN. 1956 Empirical orthogonal functions and statistical weather prediction. Technical report Statistical Forecast Project Report 1 Dept. of Meteor. MIT: 49.

Preisendorfer RW, 1988, Principal component analysis in meteorology and oceanography

10.1002/joc.1243

Huber PJ, 1977, Robust statistical procedures

10.1002/0471725250

10.1016/0047-259X(81)90091-9

10.1080/01621459.2014.880057

10.1002/9780470434697

10.1023/A:1023709501986

Wright J Peng Y Ma Y Ganesh A Rao S. 2009 Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization. In Proc. of Neural Information Processing Systems 2009 (NIPS 2009) Vancouver BC Canada 7–10 December 2009 . See http://papers.nips.cc/paper/3704-robust-principal-component- analysis-exact-recovery-of-corrupted-low-rank-matrices-via-convex-optimization.pdf .

10.1145/1970392.1970395

Zhao Q Meng D Xu Z Zuo W Zhang L. 2014 Robust principal component analysis with complex noise. In Proc. of the 31st Int. Conf. on Machine Learning Beijing China 21–26 June 2014 . See http://jmlr.org/proceedings/papers/v32/zhao14.pdf.

Bouwmans T Zahzah E. 2014 Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance. Comput. Vis. Image Underst . 122 22–34. (doi:10.1016/j.cviu.2013.11.009)

10.1007/978-3-642-57155-8

10.1002/widm.1133

10.1109/21.286391

Makosso-Kallyth S. In press. Principal axes analysis of symbolic histogram variables. Stat. Anal. Data Mining . (doi:10.1002/sam.11270)

10.1016/j.csda.2008.05.028