Partial least squares discriminant analysis: taking the magic away

Journal of Chemometrics - Tập 28 Số 4 - Trang 213-225 - 2014
Richard G. Brereton1, Gavin R. Lloyd2
1School of Chemistry, University of Bristol, Cantocks Close, Bristol, BS8 1TS, UK
2Biophotonics Research Unit Gloucestershire Hospitals NHS Foundation Trust Great Western Road Gloucester GL1 3NN UK

Tóm tắt

Partial least squares discriminant analysis (PLS‐DA) has been available for nearly 20 years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS‐DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS‐DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS‐DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright © 2014 John Wiley & Sons, Ltd.

Từ khóa


Tài liệu tham khảo

10.1002/cem.785

Gottfries J, 1995, Diagnosis of dementias using partial least squares discriminant analysis, Dementia, 6, 83

10.1002/9780470746462

10.1016/j.chemolab.2008.07.010

10.1016/0003-2670(86)80028-9

10.1016/S0169-7439(01)00155-1

Mahalanobis PC, 1936, On the generalised distance in statistics, Proc. Natl. Inst. Sci. India, 2, 49

10.1080/01621459.1989.10478752

10.2307/2690437

10.1016/0169-7439(87)80084-9

Brereton RG, 2006, Consequences of sample sizes, variable selection, model validation and optimisation for predicting classification ability from analytical data, TrAC, 25, 1103

10.1007/s11306-009-0164-4

10.1111/j.1469-1809.1936.tb02137.x