
Journal of Chemometrics
SCIE-ISI SCOPUS (1987-2023)
1099-128X
0886-9383
Anh Quốc
Cơ quản chủ quản: John Wiley and Sons Ltd , WILEY
Các bài báo tiêu biểu
Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed. Copyright © 2003 John Wiley & Sons, Ltd.
A generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O‐PLS), is described. O‐PLS removes variation from
In this paper we develop the mathematical and statistical structure of PLS regression. We show the PLS regression algorithm and how it can be interpreted in model building. The basic mathematical principles that lie behind two block PLS are depicted. We also show the statistical aspects of the PLS method when it is used for model building. Finally we show the structure of the PLS decompositions of the data matrices involved.
The characteristics of the OPLS method have been investigated for the purpose of discriminant analysis (OPLS‐DA). We demonstrate how class‐orthogonal variation can be exploited to augment classification performance in cases where the individual classes exhibit divergence in within‐class variation, in analogy with soft independent modelling of class analogy (SIMCA) classification. The prediction results will be largely equivalent to traditional supervised classification using PLS‐DA if no such variation is present in the classes. A discriminatory strategy is thus outlined, combining the strengths of PLS‐DA and SIMCA classification within the framework of the OPLS‐DA method. Furthermore, resampling methods have been employed to generate distributions of predicted classification results and subsequently assess classification belief. This enables utilisation of the class‐orthogonal variation in a proper statistical context. The proposed decision rule is compared to common decision rules and is shown to produce comparable or less class‐biased classification results. Copyright © 2007 John Wiley & Sons, Ltd.
A new diagnostic called the
Partial least squares discriminant analysis (PLS‐DA) has been available for nearly 20 years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS‐DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS‐DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS‐DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright © 2014 John Wiley & Sons, Ltd.
This report describes significance testing for PLS and OPLS® (orthogonal PLS) models. The testing is applicable to single‐
This paper presents a dedicated investigation and practical description of how to apply PARAFAC modeling to complicated fluorescence excitation–emission measurements. The steps involved in finding the optimal PARAFAC model are described in detail based on the characteristics of fluorescence data. These steps include choosing the right number of components, handling problems with missing values and scatter, detecting variables influenced by noise and identifying outliers. Various validation methods are applied in order to ensure that the optimal model has been found and several common data‐specific problems and their solutions are explained. Finally, interpretations of the specific models are given. The paper can be used as a tutorial for investigating fluorescence landscapes with multi‐way analysis. Copyright © 2003 John Wiley & Sons, Ltd.
This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography–mass spectrometry (GC‐MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of