Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation

Journal of Chemometrics - Tập 29 Số 10 - Trang 528-536 - 2015
Mireia Farrés1, Stefan Platikanov1, Stefan Tsakovski2, Romà Tauler1
1Department of Environmental Chemistry IDAEA‐CSIC Jordi Girona 18 08034 Barcleona Spain
2Department of Analytical Chemistry, Faculty of Chemistry Sofia University James Bourchier Blvd, 1164 Sofia Bulgaria

Tóm tắt

This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography–mass spectrometry (GC‐MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients (r), levels of significance (p‐value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC‐MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes. Copyright © 2015 John Wiley & Sons, Ltd.

Từ khóa


Tài liệu tham khảo

10.1002/cem.1180020207

10.1016/0003-2670(86)80028-9

Wold H, 1966, Estimation of Principal Components and Related Models by Iterative Least squares, 391

10.1016/S0169-7439(01)00155-1

10.1002/cem.1360

10.1016/j.chemolab.2012.07.010

Wold S, 1993, PLS‐partial least squares projections to latent structures, 523

ErikssonL JohanssonE Kettaneh‐WoldN TryggJ WikströmC WoldS.Multi‐ and megavariate data analysis Part 1 basic principles and applications: Umetrics AB 2006.

10.1016/j.chemolab.2008.08.004

Pérez‐Enciso M, 2003, Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS‐DA) approach, Hum. Genet., 112, 581

10.1186/1479-5876-3-32

10.1016/j.talanta.2012.10.044

10.1007/BF00877428

10.1021/pr050364w

10.1016/j.chemolab.2004.04.004

10.1016/j.aca.2015.04.051

10.1002/cem.2627

10.1007/s00216-012-6008-5

10.1021/pr100142m

10.1016/j.aca.2012.12.050

10.1021/ac802514y

10.1016/j.aca.2012.11.012

10.1016/j.chemolab.2014.08.005

Krakowska B, 2014, Detection of discoloration in diesel fuel based on gas chromatographic fingerprints, Anal. Bioanal. Chem., 407, 1

10.1016/j.watres.2012.10.040

10.1021/es4012299

10.1016/0003-2670(86)80028-9

10.1016/j.chemolab.2004.12.011

Devesa R, 2004, The panel of Aigües de Barcelona: 15 years of history, Water Sci. Technol., 49, 145, 10.2166/wst.2004.0556

10.2166/wst.2007.171

FerdelmanTG KanoA WilliamsT HenrietJ‐P ScientistsE.Expedition 307 Scientists. Site U1318. Proc IODP: Washington DC (Integrated Ocean Drilling Program Management International Inc. 2006.

10.1016/0021-9673(95)00471-8

10.1016/S0021-9673(96)00669-3

10.1021/ac034173t

EilersP BoelensH.Baseline correction with asymmetric least squares smoothing 2005.

10.1016/S0021-9673(98)00021-1

10.1002/cem.859

10.1016/S0016-7037(98)00097-0

10.1007/s12247-015-9216-7

10.1016/j.chemolab.2011.02.002

10.1007/BF00355214

10.1016/0016-7037(80)90067-8