On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning
Tóm tắt
Từ khóa
Tài liệu tham khảo
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning (Springer series in statistics). 2nd ed. New York: Springer; 2009.
Westerhuis JA, Hoefsloot HCJ, Smit S, Vis DJ, Smilde AK, van Velzen EJJ, van Duijnhoven JPM, van Dorsten FA. Assessment of PLSDA cross validation. Metabolomics. 2008;4:81–9.
Harrington PD. Multiple versus single set validation of multivariate models to avoid mistakes. Crit Rev Anal Chem. 2017;48:33–46.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence. San Mateo: Morgan Kaufmann; 1995. p. 1137–43.
Daszykowski M, Walczak B, Massart DL. Representative subset selection. Anal Chim Acta. 2002;468:91–103.
Puzyn T, Mostrag-Szlichtyng A, Gajewicz A, Skrzyński M, Worth AP. Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models. Struct Chem. 2011;22:795–804.
Harrington PD. Statistical validation of classification and calibration models using bootstrapped latin partitions. Trends Anal Chem. 2006;25:1112–24.
Galvão RKH, Araujo MCU, José GE, Pontes MJC, Silva EC, Saldanha TCB. A method for calibration and validation subset partitioning. Talanta. 2005;67:736–40.
Melnykov V, Chen WC, Maitra R. MixSim: an R package for simulating data to study performance of clustering algorithms. J Stat Softw. 2012;51:1–25.
Riani M, Cerioli A, Perrotta D, Torti F. Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library. Adv Data Anal Classif. 2015;9:461–81.
Ballabio D, Consonni V. Classification tools in chemistry. Part 1: linear models. PLS-DA. Anal Methods. 2013;5:3790–8.
Gromski PS, Muhamadali H, Ellis DI, Xu Y, Correa E, Turner ML, Goodacre R. A tutorial review: metabolomics and partial least squares-discriminant analysis—a marriage of convenience or a shotgun wedding. Anal Chim Acta. 2015;879:10–23.
Xu Y, Zomer S, Brereton R. Support vector machines: a recent method for classification in chemometrics. Crit Rev Anal Chem. 2006;36:177–88.
Luts J, Ojeda F, de Plas RV, Moor BD, Huffel SV, Suykens JAK. A tutorial on support vector machine-based methods for classification problems in chemometrics. Anal Chim Acta. 2010;665:129–45.
Gromski PS, Xu Y, Correa E, Ellis DI, Turner ML, Goodacre R. A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data. Anal Chim Acta. 2014;829:1–8.
FSDA toolbox is available at http://rosa.unipr.it/fsda.html . Accessed 29 May 2018.
LibSVM Toolbox is available at https://www.csie.ntu.edu.tw/~cjlin/libsvm/ . Accessed 29 May 2018.
Liblinear Toolbox is available at https://www.csie.ntu.edu.tw/~cjlin/liblinear/ . Accessed 29 May 2018.
Brereton RG. Chemometrics: data analysis for the laboratory and chemical plant. Chichester: Wiley; 2003.
Duda RO, Hart PE, Stork DG. Pattern classification. New York: Wiley; 2001.
Trivedi DK, Hollywood KA. Goodacre R metabolomics for the masses: the future of metabolomics in a personalized world. New Horiz Transl Med. 2017;3:294–305.
Broadhurst DI, Kell DB. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics. 2006;2:171–96.
Rajer-Kanduč K, Zupan J, Majcen N. Separation of data on the training and test set for modelling: a case study for modelling of five colour properties of a white pigment. Chemom Intell Lab Syst. 2003;65:221–9.
Marini F, Magrì AL, Bucci R, Magrì AD. Use of different artificial neural networks to resolve binary blends of monocultivar Italian olive oils. Anal Chimica Acta. 2007;599:232–40.
Dunn WB, Lin W, Broadhurst D, Begley P, Brown M, Zelena E, Vaughan AA, Halsall A, Harding N, Knowles JD, Francis-McIntyre S, Tseng A, Ellis DI, O’Hagan S, Aarons G, Benjamin B, Chew-Graham S, Moseley C, Potter P, Winder CL, Potts C, Thornton P, McWhirter C, Zubair M, Pan M, Burns A, Cruickshank JK, Jayson GC, Purandare N, Wu FCW, Finn JD, Haselden JN, Nicholls AW, Wilson ID, Goodacre R, Kell DB. Molecular phenotyping of a UK population: defining the human serum metabolome. Metabolomics. 2015;11:9–26.