Ensemble-based classifiers

Artificial Intelligence Review - Tập 33 - Trang 1-39 - 2009
Lior Rokach1
1Department of Information System Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel

Tóm tắt

The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. This paper, review existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.

Tài liệu tham khảo

Arbel R, Rokach L (2006) Classifier evaluation under limited resources. Pattern Recognit Lett 27(14): 1619–1631 Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1): 173–180 Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 35: 1–38 Bay S (1999) Nearest neighbor classification from multiple feature subsets. Intell Data Anal 3(3): 191–209 Biermann AW, Faireld J, Beres T (1982) Signature table systems and learning. IEEE Trans Syst Man Cybern 12(5): 635–648 Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140 Breiman L (1998) Arcing classifiers. Ann Stat 26(3): 801–849 Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(2): 85–103 Breiman L (2001) Random forests. Mach Learn 45: 5–32 Brodley CE (1995) Recursive automatic bias selection for classifier construction. Mach Learn 20: 63–94 Bryll R, Gutierrez-Osuna R, Quek F (2003) Bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognit 36: 1291–1302 Brown G, Wyatt JL (2003) Negative correlation learning and the ambiguity family of ensemble methods. Proceedings of 4th international workshop, Mult Classifier Syst 2003, Guilford, UK, June 11–13, 2003, Lecture Notes in Computer Science, vol 2709, pp 266–275 Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1): 5–20 Buchanan BG, Shortliffe EH (1984) Rule based expert systems. Addison-Wesley, Reading 272–292 Buntine W (1990) A theory of learning classification rules. Doctoral Dissertation. School of Computing Science, University of Technology. Sydney. Australia Buntine W (1996) Graphical models for discovering knowledge. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge, pp 59–82 Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models, twenty-first international conference on Machine learning, July 04–08, 2004, Banff, Alberta, Canada Chan PK, Stolfo SJ (1993) Toward parallel and distributed learning by meta-learning. In: AAAI Workshop in Knowledge Discovery in Databases, pp 227–240 Chan PK, Stolfo SJ (1995) A comparative evaluation of voting and meta-learning on partitioned data, Proceeding of 12th international conference On machine learning ICML-95 Chan PK, Stolfo SJ (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8: 5–28 Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2(6): 429–444 Chawla NV, Hall LO, Bowyer KW, Kegelmeyer WP (2004) Learning ensembles from bites: a scalable and accurate approach. J Mach Learn Res Arch 5: 421–451 Chen K, Wang L, Chi H (1997) Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Intern J Pattern Recognit Artif Intell 11(3): 417–445 Cherkauer KJ (1996) Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In: Working notes, integrating multiple learned models for improving and scaling machine learning algorithms workshop, thirteenth national conference on artificial intelligence. AAAI Press, Portland, OR Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the European working session on learning, Pitman, pp 151–163 Cohen S, Rokach L, Maimon O (2007) Decision tree instance space decomposition with grouped gain-ratio. Inf Sci 177: 3592–3612 Dasarathy BV, Sheela BV (1979) Composite classifier system design: concepts and methodology. Proc IEEE 67(5): 708–713 Derbeko P, El-Yaniv R, Meir R (2002) Variance optimized bagging. Eur Conf Mach Learn Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one?. Mach Learn 54(3): 255–273 Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286 Dimitriadou E, Weingessel A, Hornik K (2003) A cluster ensembles framework, Design and application of hybrid intelligent systems. IOS Press, Amsterdam Domingos P (1996) Using partitioning to speed up specific-to-general rule induction. In: Proceedings of the AAAI-96 workshop on integrating multiple learned models. AAAI Press, Cambridge, pp 29–34 Frelicot C, Mascarilla L (2001) Reject strategies driver combination of pattern classifiers Freund S (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2): 256–285 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, pp 325–332 Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting Gama J (2004) A linear-bayes classifier. In: Monard C (ed) Advances on artificial intelligence—SBIA2000. LNAI 1952, Springer, Berlin, pp 269–279 Gams M (1989) New measurements highlight the importance of redundant knowledge. In: European working session on learning, Montpeiller, France, Pitman Garcia-Pddrajas N, Garcia-Osorio C, Fyfe C (2007) Nonlinear boosting projections for ensemble construction. J Mach Learn Res 8: 1–33 Hampshire JB, Waibel A (1992) The meta-Pi network—building distributed knowledge representations for robust multisource pattern-recognition. Pattern Anal Mach Intell 14(7): 751–769 Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10): 993–1001 Hansen J (2000) Combining predictors. Meta machine learning methods and bias, variance & ambiguity decompositions. PhD Dissertation. Aurhus University Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8): 832–844 Holmstrom L, Koistinen P, Laaksonen J, Oja E (1997) Neural and statistical classifiers—taxonomy and a case study. IEEE Trans Neural Netw 8: 5–17 Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425 Hu Q, Yu D, Xie Z, Li X (2007) EROS: ensemble rough subspaces. Pattern Recognit 40: 3728–3739 Hu X (2001) Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. ICDM01. pp 233–240 Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4): 820–834 Jenkins R, Yuhas BPA (1993) Simplified neural network solution through problem decomposition: The case of Truck backer-upper. IEEE Trans Neural Netw 4(4): 718–722 Johansen TA, Foss BA (1992) A narmax model representation for adaptive control based on local model—Modeling. Identif Control 13(1): 25–39 Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6: 181–214 Kamath C, Cantu-Paz E (2001) Creating ensembles of decision trees through sampling, Proceedings, 33rd symposium on the interface of computing science and statistics, Costa Mesa, CA, June 2001 Kamath C, Cantú E, Littau D (2002) Approximate splitting for ensembles of trees using histograms. In: Second SIAM international conference on data mining (SDM-2002) Kohavi R (1996) Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 114–119 Kolen JF, Pollack JB (1991) Back propagation is sesitive to initial conditions. In: Advances in neural information processing systems, vol 3. Morgan Kaufmann, San Francisco, pp 860–867 Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation and active learning. Adv Neural Inf Process Syst 7: 231–238 Kuncheva L (2005) Combining pattern classifiers. Wiley Press, New York Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach Learn 51(2):181–207 Kuncheva LI (2005) Diversity in multiple classifier systems (Editorial). Inf Fusion 6(1): 3–4 Kusiak A (2000) Decomposition in data mining: an industrial case study. IEEE Trans Electron Packaging Manuf 23(4): 345–353 Langdon WB, Barrett SJ, Buxton BF, (2002) Combining decision trees and neural networks for drug discovery. In: Genetic programming, proceedings of the 5th European conference, EuroGP 2002, Kinsale, Ireland, pp 60–70 Liu Y (2005) Generate different neural networks by negative correlation learning. ICNC 1: 149–156 Liu H, Mandvikar A, Mody J (2004) An empirical study of building compact ensembles. WAIM 622–627 Long C (2003) Bi-decomposition of function sets using multi-valued logic, Eng Doc Dissertation, Technischen Universitat Bergakademie Freiberg Maimon O, Rokach L (2005) Decomposition methodology for knowledge discovery and data mining: theory and applications. World Scientific, Singapore Maimon O, Rokach L (2002) Improving supervised learning by feature decomposition. In: Proceedings of foundations of information and knowledge systems, Salzan Castle, Germany, pp 178–196 Margineantu D, Dietterich T (1997) Pruning adaptive boosting. In: Proceedings of fourteenth international conference machine learning, pp 211–218 Melville P, Mooney RJ (2003) Constructing diverse classifier ensembles using artificial training examples. IJCAI 505–512 Merler S, Caprile B, Furlanello C (2007) Parallelizing AdaBoost by weights dynamics. Comput Stat Data Anal 51: 2487–2498 Merz CJ (1999) Using correspondence analysis to combine classifier. Mach Learn 36(1–2): 33–58 Michalski RS, Tecuci G (1994) Machine learning, a multistrategy approach. Morgan Kaufmann, San Francisco Michie D (1995) Problem decomposition and the learning of skills, In: Proceedings of the European conference on machine learning, Springer, Berlin, pp 17–31 Mitchell T (1980) The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University, Department of Computer Science, New Brunswick Nowlan SJ, Hinton GE (1991) Evaluation of adaptive mixtures of competing experts. In: Lippmann RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kaufmann Publishers, San Francisco, pp 774–780 Ohno-Machado L, Musen MA (1997) Modular neural networks for medical prognosis: quantifying the benefits of combining neural networks for survival prediction. Connect Sci 9(1): 71–86 Opitz D (1999) Feature selection for ensembles. In: Proceedings of 16th National Conference on Artificial Intelligence, AAAI. pp 379–384 Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neuralá1network ensemble. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 535–541 Parmanto B, Munro PW, Doyle HR (1996) Improving committee diagnosis with resampling techinques. In: Touretzky DS, Mozer MC, Hesselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 882–888 Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8(4): 869–893 Passerini A, Pontil M, Frasconi P (2004) New results on error correcting output codes of kernel machines. IEEE Trans Neural Netw 15: 45–54 Peng F, Jacobs RA, Tanner MA (1995) Bayesian inference in mixtures-of-experts and hierarchical mixturesof-experts models with an application to speech recognition. J Am Stat Assoc 91(435):953–960 Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3): 21–45 Prodromidis AL, Stolfo SJ, Chan PK (1999) Effective and efficient pruning of metaclassifiers in a distributed Data Mining system. Technical report CUCS-017-99, Columbia University Prodromidis AL, Stolfo SJ (2001) Cost complexity-based pruning of ensemble classifiers. Knowl Inf Syst 3(4): 449–469 Provost FJ, Kolluri V (1997) A survey of methods for scaling up inductive learning algorithms. In: Proceeding of 3rd international conference on knowledge discovery and data mining Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los Altos Quinlan JR (1996) Bagging, Boosting, and C4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730 Rahman AFR, Fairhurst MC (1997) A new hybrid approach in combining multiple experts to recognize handwritten numerals. Pattern Recognit Lett 18: 781–790 Ramamurti V, Ghosh J (1999) Structurally adaptive modular networks for non-stationary environments. IEEE Trans Neural Netw 10(1): 152–160 Rokach L (2006) Decomposition methodology for classification tasks—A meta decomposer framework. Pattern Anal Appl 9: 257–271 Rokach L (2008) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit 41(5): 1676–1700 Rokach L (2009) Collective-agreement-based pruning of ensembles. Comput Stat Data Anal 53(4): 1015–1026 Rokach L, Arbel R, Maimon O (2006) Selective voting—getting more for less in sensor fusion. Intern J Pattern Recognit Artif Intell 20(3): 329–350 Rokach L, Maimon O (2005a) Top down induction of decision trees classifiers: A survey. IEEE SMC Trans Part C 35(4):476–487 Rokach L, Maimon O (2005b) Feature Set decomposition for decision trees. J Intell Data Anal 9(2): 131–158 Rokach L, Maimon O (2008) Data mining with decision trees: theory and applications. World Scientific Publishing, Singapore Rokach L, Maimon O, Arad O (2005) Improving supervised learning by sample decomposition. Int J Comput Intell Appl 5(1): 37–54 Rokach L, Maimon O, Lavi I (2003) Space decomposition in data mining: A clustering approach. In: Proceedings of the 14th international symposium on methodologies for intelligent systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer, Berlin, pp 24–31 Rudin C, Daubechies I, Schapire RE (2004) The dynamics of Adaboost: cyclic behavior and convergence of margins. J Mach Learn Res 5: 1557–1595 Rosen BE (1996) Ensemble learning using decorrelated neural networks. Connect Sci 8(3): 373–384 Samuel A (1967) Some studies in machine learning using the game of checkers II: Recent progress. IBM J Res Develop 11: 601–617 Saaty X (1996) The analytic hierarchy process: A 1993 overview. Cent Eur J Oper Res Econ 2(2): 119–137 Schaffer C (1993) Selecting a classification method by cross-validation. Mach Learn 13(1): 135–143 Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227 Seewald AK, Fürnkranz J (2001) Grading classifiers, Austrian research institute for Artificial intelligence Sharkey A (1996) On combining artificial neural nets. Connect Sci 8: 299–313 Sharkey N, Neary J, Sharkey A (1995) Searching weight space for backpropagation solution types, current trends in connectionism: Proceedings of the 1995 Swedish conference on connectionism, pp 103–120 Shilen S (1990) Multiple binary tree classifiers. Pattern Recognit 23(7): 757–763 Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5(2): 121–135 Sohn SY, Choi H (2001) Ensemble based on data envelopment analysis. ECML Meta Learning workshop Tamon C, Xiang J (2000) On the boosting pruning problem. In: Proceedings of the 11th European conference on machine learning, pp 404–412 Towell G, Shavlik J (1994) Knowledge-based artificial neural networks. Artif Intell 70: 119–165 Tsoumakas G, Partalas I, Vlahavas I (2008) A taxonomy and short review of ensemble selection. In: ECAI 2008, workshop on supervised and unsupervised ensemble methods and their applications Tsymbal A, Puuronen S (2002) Ensemble feature selection with the simple bayesian classification in medical diagnostics, In: Proceedings of 15th IEEE symposium on Computer-Based Medical Systems CBMS2002, IEEE CS Press, Maribor, Slovenia, pp 225–230 Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connection science, special issue on combining artificial neural networks: ensemble approaches. 8(3–4): 385–404 Tumer K, Ghosh J (2000) Robust Order Statistics based Ensembles for Distributed Data Mining. In: Kargupta H, Chan P (eds) Advances in distributed and parallel knowledge discovery. AAAI/MIT Press, Cambridge, pp 185–210 Tumer K, Oza CN (2003) Input decimated ensembles. Pattern Anal Appl 6: 65–77 Wang W, Jones P, Partridge D (2000) Diversity between neural networks and decision trees for building multiple classifier systems, In: Proceeding of international workshop on multiple classifier systems (LNCS 1857), Springer, Calgiari, pp 240–249 Weigend AS, Mangeas M, Srivastava AN (1995) Nonlinear gated experts for time-series—discovering regimes and avoiding overfitting. Int J Neural Syst 6(5): 373–399 Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: Verleysen M (ed) Proceedings of the 7th European symposium on artificial neural networks (ESANN-99), Bruges, Belgium, pp 219–224 Windeatt T, Ardeshir G (2001) An Empirical comparison of pruning methods for ensemble classifiers, IDA2001, LNCS 2189, pp 208–217 Woods K, Kegelmeyer W, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19: 405–410 Wolpert DH (1992) Stacked generalization. In: (eds) Neural Networks, vol 5. Pergamon Press, Oxford, pp 241–259 Yanim S, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition (40): 3358–3378 Yates W, Partridge D (1996) Use of methodological diversity to improve neural network generalization. Neural Comput Appl 4(2): 114–128 Zhang Y, Burer S, Street WN (2006) Ensemble pruning via semi-definite programming. J Mach Learn Res 7: 1315–1338 Zhang CX, Zhang JS (2008) A local boosting algorithm for solving classification problems. Comput Stat Data Anal 52(4): 1928–1941 Zhou ZH, Tang W (2003) Selective ensemble of decision trees. In: Wang G, Liu Q, Yao Y, Skowron A (eds) Rough sets, fuzzy sets, data mining, and granular computing, 9th international conference, RSFDGrC, Chongqing, China, Proceedings. Lecture Notes in Computer Science 2639, pp 476–483 Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137: 239–263 Zupan B, Bohanec M, Demsar J, Bratko I (1998) Feature transformation by function decomposition. IEEE Intell Syst Appl 13: 38–43