Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection
Tóm tắt
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Tài liệu tham khảo
Luo G, Stone BL, Sakaguchi F, Sheng X, Murtaugh MA. Using computational approaches to improve risk-stratified patient management: rationale and methods. JMIR Res Protoc. 2015;4(4):e128.
Luo G, Sward K. A roadmap for optimizing asthma care management via computational approaches. JMIR Med Inform. 2017;5(3):e32.
Luo G, Stone BL, Johnson MD, Nkoy FL. Predicting appropriate admission of bronchiolitis patients in the emergency department: rationale and methods. JMIR Res Protoc. 2016;5(1):e41.
Luo G, Nkoy FL, Gesteland PH, Glasgow TS, Stone BL. A systematic review of predictive modeling for bronchiolitis. Int J Med Inform. 2014;83(10):691–714.
Divita G, Luo G, Tran LT, Workman TE, Gundlapalli AV, Samore MH. General symptom extraction from VA electronic medical notes. Stud Health Technol Inform. 2017.
Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques. 4th ed. Burlington: Morgan Kaufmann; 2016.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, VanderPlas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Schaul T, Bayer J, Wierstra D, Sun Y, Felder M, Sehnke F, Rückstieß T, Schmidhuber J. PyBrain. J Mach Learn Res. 2010;11:743–6.
Jovic A, Brkic K, Bogunovic N. An overview of free software tools for general data mining. In: Proceedings of MIPRO 2014, pp. 1112–7.
Thornton C, Hutter F, Hoos HH, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of KDD 2013, pp. 847–55.
Luo G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Netw Model Anal Health Inform Bioinform. 2016;5:18.
Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F. Efficient and robust automated machine learning. In: Proceedings of NIPS 2015, pp. 2944–52.
Komer B, Bergstra J, Eliasmith C. Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: Proceedings of SciPy 2014, pp. 33–9.
Salvador MM, Budka M, Gabrys B. Towards automatic composition of multicomponent predictive systems. In: Proceedings of HAIS 2016, pp. 27–39.
Luo G. MLBCD: a machine learning tool for big clinical data. Health Inf Sci Syst. 2015;3:3.
Luo G. PredicT-ML: a tool for automating machine learning model building with big clinical data. Health Inf Sci Syst. 2016;4:5.
Luo G, Stone BL, Johnson MD, Tarczy-Hornoch P, Wilcox AB, Mooney SD, Sheng X, Haug PJ, Nkoy FL. Automating construction of machine learning models with clinical big data: proposal rationale and methods. JMIR Res Protoc. 2017;6(8):e175.
Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N. Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE. 2015;104(1):148–75.
Provost FJ, Jensen D, Oates T. Efficient progressive sampling. In: Proceedings of KDD 1999, pp. 23–32.
Hutter F, Hoos HH, Leyton-Brown K. Sequential model-based optimization for general algorithm configuration. In: Proceedings of LION 2011, pp. 507–23.
Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. In: Proceedings of NIPS 2012, pp. 2960–8.
Eggensperger K, Hutter F, Hoos H, Leyton-Brown K. Efficient benchmarking of hyperparameter optimizers via surrogates. In: Proceedings of AAAI 2015, pp. 1114–20.
Klein A, Bartels S, Falkner S, Hennig P, Hutter F. Towards efficient Bayesian optimization for big data. In: Proceedings of NIPS 2015 workshop on Bayesian Optimization.
Klein A, Falkner S, Bartels S, Hennig P, Hutter F. Fast Bayesian optimization of machine learning hyperparameters on large datasets. In: Proceedings of AISTATS 2017, pp. 528–36.
Krueger T, Panknin D, Braun ML. Fast cross-validation via sequential testing. J Mach Learn Res. 2015;16:1103–55.
Nickson T, Osborne MA, Reece S, Roberts SJ. Automated machine learning on big data using stochastic algorithm tuning. http://arxiv.org/abs/1407.7969 (2017). Accessed 28 Mar 2017.
Swersky K, Snoek J, Adams RP. Multi-task Bayesian optimization. In: Proceedings of NIPS 2013, pp. 2004–12.
Wang L, Feng M, Zhou B, Xiang B, Mahadevan S. Efficient hyper-parameter optimization for NLP applications. In: Proceedings of EMNLP 2015, pp. 2112–7.
van den Bosch A. Wrapped progressive sampling search for optimizing learning algorithm parameters. In: Proceedings of 16th Belgian-Dutch Conference on Artificial Intelligence 2004, pp. 219–26.
Fürnkranz J, Petrak J. An evaluation of landmarking variants. In: Proceedings of ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, 2001, pp. 57–68.
Gu B, Liu B, Hu F, Liu H. Efficiently determining the starting sample size for progressive sampling. In: Proceedings of ECML, 2001, pp. 192–202.
John GH, Langley P. Static versus dynamic sampling for data mining. In: Proceedings of KDD, 1996, pp. 367–70.
Leite R, Brazdil P. Predicting relative performance of classifiers from samples. In: Proceedings of ICML, 2005, pp. 497–503.
Leite R, Brazdil P. Active testing strategy to predict the best classification algorithm via sampling and metalearning. In: Proceedings of ECAI, 2010, pp. 309–14.
Leite R, Brazdil P, Vanschoren J. Selecting classification algorithms with active testing. In: Proceedings of MLDM 2012, pp. 117–31.
Petrak J. Fast subsampling performance estimates for classification algorithm selection. In: Proceedings of ECML Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, 2000, pp. 3–14.
Soares C, Petrak J, Brazdil P. Sampling-based relative landmarks: systematically test-driving algorithms before choosing. In: Proceedings of EPIA, 2001, pp. 88–95.
Hoffman MD, Shahriari B, de Freitas N. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In: Proceedings of AISTATS, 2014, pp. 365–74.
Sabharwal A, Samulowitz H, Tesauro G. Selecting near-optimal learners via incremental data allocation. In: Proceedings of AAAI, 2016, pp. 2007–15.
Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Efficient hyperparameter optimization and infinitely many armed bandits. http://arxiv.org/abs/1603.06560. Accessed 28 Mar 2017.
David Forney G Jr. On the Hamming distance properties of group codes. IEEE Trans Inf Theory. 1992;38(6):1797–801.
Shepard D. A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of ACM National Conference, 1968, pp. 517–24.
Rao RB, Fung G. On the dangers of cross-validation. An experimental evaluation. In: Proceedings of SDM 2008, pp. 588–96.
Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010;11:2079–107.
Reunanen J. Overfitting in making comparisons between variable selection methods. J Mach Learn Res. 2003;3:1371–82.
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. Generalization in adaptive data analysis and holdout reuse. In: Proceedings of NIPS 2015, pp. 2350–8.
Fernández Delgado M, Cernadas E, Barro S, Gomes Amorim D. Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res. 2014;15(1):3133–81.
Sparks ER, Talwalkar A, Haas D, Franklin MJ, Jordan MI, Kraska T. Automating model search for large scale machine learning. In: Proceedings of SoCC 2015, pp. 368–80.
Feurer M, Springenberg T, Hutter F. Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of AAAI 2015, pp. 1128–35.
Wistuba M, Schilling N, Schmidt-Thieme L. Hyperparameter optimization machines. In: Proceedings of DSAA 2016, pp. 41–50.
Auto-WEKA: sample datasets. http://www.cs.ubc.ca/labs/beta/Projects/autoweka/datasets. Accessed 28 Mar 2017.
University of California, Irvine machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 28 Mar 2017.