Journal of Chemometrics

  1099-128X

  0886-9383

  Anh Quốc

Cơ quản chủ quản:  John Wiley and Sons Ltd , WILEY

Lĩnh vực:
Applied MathematicsAnalytical Chemistry

Các bài báo tiêu biểu

Partial least squares for discrimination
Tập 17 Số 3 - Trang 166-173 - 2003
Matthew L Barker, William S. Rayens
AbstractPartial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed. Copyright © 2003 John Wiley & Sons, Ltd.
Orthogonal projections to latent structures (O‐PLS)
Tập 16 Số 3 - Trang 119-128 - 2002
Johan Trygg, Svante Wold
AbstractA generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O‐PLS), is described. O‐PLS removes variation from X (descriptor variables) that is not correlated to Y (property variables, e.g. yield, cost or toxicity). In mathematical terms this is equivalent to removing systematic variation in X that is orthogonal to Y. In an earlier paper, Wold et al. (Chemometrics Intell. Lab. Syst. 1998; 44: 175–185) described orthogonal signal correction (OSC). In this paper a method with the same objective but with different means is described. The proposed O‐PLS method analyzes the variation explained in each PLS component. The non‐correlated systematic variation in X is removed, making interpretation of the resulting PLS model easier and with the additional benefit that the non‐correlated variation itself can be analyzed further. As an example, near‐infrared (NIR) reflectance spectra of wood chips were analyzed. Applying O‐PLS resulted in reduced model complexity with preserved prediction ability, effective removal of non‐correlated variation in X and, not least, improved interpretational ability of both correlated and non‐correlated variation in the NIR spectra. Copyright © 2002 John Wiley & Sons, Ltd.
PLS regression methods
Tập 2 Số 3 - Trang 211-228 - 1988
Agnar Höskuldsson
AbstractIn this paper we develop the mathematical and statistical structure of PLS regression. We show the PLS regression algorithm and how it can be interpreted in model building. The basic mathematical principles that lie behind two block PLS are depicted. We also show the statistical aspects of the PLS method when it is used for model building. Finally we show the structure of the PLS decompositions of the data matrices involved.
OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification
Tập 20 Số 8-10 - Trang 341-351 - 2006
Max Bylesjö, Mattias Rantalainen, Olivier Cloarec, Jeremy K. Nicholson, Elaine Holmes, Johan Trygg
AbstractThe characteristics of the OPLS method have been investigated for the purpose of discriminant analysis (OPLS‐DA). We demonstrate how class‐orthogonal variation can be exploited to augment classification performance in cases where the individual classes exhibit divergence in within‐class variation, in analogy with soft independent modelling of class analogy (SIMCA) classification. The prediction results will be largely equivalent to traditional supervised classification using PLS‐DA if no such variation is present in the classes. A discriminatory strategy is thus outlined, combining the strengths of PLS‐DA and SIMCA classification within the framework of the OPLS‐DA method. Furthermore, resampling methods have been employed to generate distributions of predicted classification results and subsequently assess classification belief. This enables utilisation of the class‐orthogonal variation in a proper statistical context. The proposed decision rule is compared to common decision rules and is shown to produce comparable or less class‐biased classification results. Copyright © 2007 John Wiley & Sons, Ltd.
A new efficient method for determining the number of components in PARAFAC models
Tập 17 Số 5 - Trang 274-286 - 2003
Rasmus Bro, Henk A. L. Kiers
AbstractA new diagnostic called the core consistency diagnostic (CORCONDIA) is suggested for determining the proper number of components for multiway models. It applies especially to the parallel factor analysis (PARAFAC) model, but also to other models that can be considered as restricted Tucker3 models. It is based on scrutinizing the ‘appropriateness’ of the structural model based on the data and the estimated parameters of gradually augmented models. A PARAFAC model (employing dimension‐wise combinations of components for all modes) is called appropriate if adding other combinations of the same components does not improve the fit considerably. It is proposed to choose the largest model that is still sufficiently appropriate. Using examples from a range of different types of data, it is shown that the core consistency diagnostic is an effective tool for determining the appropriate number of components in e.g. PARAFAC models. However, it is also shown, using simulated data, that the theoretical understanding of CORCONDIA is not yet complete. Copyright © 2003 John Wiley & Sons, Ltd.
Multi-way principal components-and PLS-analysis
Tập 1 Số 1 - Trang 41-56 - 1987
Svante Wold, Paul Geladi, Kim H. Esbensen, Jerker Öhman
Partial least squares discriminant analysis: taking the magic away
Tập 28 Số 4 - Trang 213-225 - 2014
Richard G. Brereton, Gavin R. Lloyd
Partial least squares discriminant analysis (PLS‐DA) has been available for nearly 20 years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS‐DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS‐DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS‐DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright © 2014 John Wiley & Sons, Ltd.
CV‐ANOVA for significance testing of PLS and OPLS® models
Tập 22 Số 11-12 - Trang 594-600 - 2008
Lennart Eriksson, Johan Trygg, Svante Wold
AbstractThis report describes significance testing for PLS and OPLS® (orthogonal PLS) models. The testing is applicable to single‐Y cases and is based on ANOVA of the cross‐validated residuals (CV‐ANOVA). Two variants of the CV‐ANOVA are introduced. The first is based on the cross‐validated predictive residuals of the PLS or OPLS model while the second works with the cross‐validated predictive score values of the OPLS model. The two CV‐ANOVA diagnostics are shown to work well in those cases where PLS and OPLS work well, that is, for data with many and correlated variables, missing data, etc. The utility of the CV‐ANOVA diagnostic is demonstrated using three datasets related to (i) the monitoring of an industrial de‐inking process; (ii) a pharmaceutical QSAR problem and (iii) a multivariate calibration application from a sugar refinery. Copyright © 2008 John Wiley & Sons, Ltd.
Practical aspects of PARAFAC modeling of fluorescence excitation‐emission data
Tập 17 Số 4 - Trang 200-215 - 2003
Charlotte M. Andersen, Rasmus Bro
AbstractThis paper presents a dedicated investigation and practical description of how to apply PARAFAC modeling to complicated fluorescence excitation–emission measurements. The steps involved in finding the optimal PARAFAC model are described in detail based on the characteristics of fluorescence data. These steps include choosing the right number of components, handling problems with missing values and scatter, detecting variables influenced by noise and identifying outliers. Various validation methods are applied in order to ensure that the optimal model has been found and several common data‐specific problems and their solutions are explained. Finally, interpretations of the specific models are given. The paper can be used as a tutorial for investigating fluorescence landscapes with multi‐way analysis. Copyright © 2003 John Wiley & Sons, Ltd.
Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation
Tập 29 Số 10 - Trang 528-536 - 2015
Mireia Farrés, Stefan Platikanov, Stefan Tsakovski, Romà Tauler
This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography–mass spectrometry (GC‐MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients (r), levels of significance (p‐value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC‐MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes. Copyright © 2015 John Wiley & Sons, Ltd.