MULTIVARIATE LINEAR QSPR/QSAR MODELS: RIGOROUS EVALUATION OF VARIABLE SELECTION FOR PLS

Computational and Structural Biotechnology Journal - Tập 5 - Trang e201302007 - 2013
Kurt Varmuza1, Peter Filzmoser2, Matthias Dehmer3
1Institute of Chemical Engineering, Vienna University of Technology, Austria
2Department of Statistics and Probability Theory, Vienna University of Technology, Austria
3Institute for Bioinformatics and Translational Research, UMIT – The Health and Life Sciences University, Hall in Tyrol, Austria

Tài liệu tham khảo

Todeschini, 2009, 2 volumes 2012 Varmuza, 2009 Varnek, 2012, Machine learning methods for property prediction in chemoinformatics: Quo vadis?, J Chem Inf Model, 52, 1413, 10.1021/ci200409x Lee, 1979, Retention indices for programmed-temperature capillary-column gas chromatography of polycyclic aromatic hydrocarbons, Anal Chem, 51, 768, 10.1021/ac50042a043 Corina, 2004, Software for the generation of high-quality three-dimensional molecular models Dragon, 2010, Software for molecular descriptor calculation, version 6.0 R, 2012 Varmuza, 2012 Filzmoser, 2009, Repeated double cross validation, J Chemometrics, 23, 160, 10.1002/cem.1225 Matlab, 2012 Octave, 2012 Ihaka, 1996, R: A language for data analysis and graphics, J Comput Graph Stat, 5, 299 Hornik, 2012, Are there too many R packages?, Austrian J Statistics, 41, 59, 10.17713/ajs.v41i1.188 Bioconductor Vandeginste, 1998 Wold, 2001, PLS-regression: a basic tool of chemometrics, Chemom Intell Lab Syst, 58, 109, 10.1016/S0169-7439(01)00155-1 de Jong, 1993, SIMPLS: an alternative approach to partial least squares regression, Chemom Intell Lab Syst, 18, 251, 10.1016/0169-7439(93)85002-X Hastie, 2001 Mevik, 2007, The pls package: Principal component and partial least squares regression in R, J Stat Software, 18, 1, 10.18637/jss.v018.i02 Filzmoser Garcia Tibshirani, 1996, Regression shrinkage and selection via the lasso, J R Statist Soc, 58, 267 Leardi, 2007, Genetic algorithms in chemistry, J Chromatogr A, 1158, 226, 10.1016/j.chroma.2007.04.025 Mercader, 2012, Partial-order ranking and linear modeling: Their use in predictive QSAR/QSPR studies, 149 CTfile, 2010 Grabner, 2012, RMol: A toolset for transforming SD/Molfile structure information into R objects, Source Code Biol Med, 7, 1 Mueller, 2011, QuACN: an R package for analyzing complex biological networks quantitatively, Bioinformatics, 27, 140, 10.1093/bioinformatics/btq606 Gasteiger, 1990, Automatic generation of 3D atomic coordinates for organic molecules, Tetrahedron Comp Method, 3, 537, 10.1016/0898-5529(90)90156-3 Renner, 2006, Impact of conformational flexibility on three-dimensional similarity searching using correlation vectors, J Chem Inf Model, 46, 2324, 10.1021/ci050075s Gasteiger, 2003 Liu, 2002, Molecular structural vector description and retention index of polycyclic aromatic hydrocarbons, Chemom Intell Lab Syst, 61, 2, 10.1016/S0169-7439(01)00146-0 Škrbić, 2004, Discrimination between linear and non-linear models for retention indices of polycyclic aromatic hydrocarbons in the so called Lee's scale, Chemom Intell Lab Syst, 72, 167, 10.1016/j.chemolab.2004.01.011 Frank, 1994 Liebminger, 2007, Multivariate models for the concentration of oxygen-18 in precipitation based on meteorological and geographical features, Chemom Intell Lab Syst, 89, 1, 10.1016/j.chemolab.2007.04.005 Hechinger, 2012, What is wrong with quantitative structure-property relations models based on three-dimensional descriptors?, J Chem Inf Model, 52, 1984, 10.1021/ci300246m