An Empirical Comparison of Pruning Methods for Decision Tree Induction
Tóm tắt
This paper compares five methods for pruning decision trees, developed from sets of examples. When used with uncertain rather than deterministic data, decision-tree induction involves three main stages—creating a complete tree able to classify all the training examples, pruning this tree to give statistical reliability, and processing the pruned tree to improve understandability. This paper concerns the second stage—pruning. It presents empirical comparisons of the five methods across several domains. The results show that three methods—critical value, error complexity and reduced error—perform well, while the other two may cause problems. They also show that there is no significant interaction between the creation and pruning methods.
Tài liệu tham khảo
Bratko, I., and Konenko, I. (1986). Learning diagnostic rules from incomplete and noisy data. Seminar on AI methods in statistics. London Business School, England: Unicom Seminars Ltd.
Bratko, I., and Lavrac, N. (Eds.) (1987). Pregress in machine learning. England: Sigma Press.
Breiman, L., Freidman, J., Olshen, R., and Stone, C. (1984). Classification and regression trees. California: Wadsworth International.
Cestnik, G., Kononenko, I., and Bratko, I. (1987). ASSISTANT 86: A knowledge elicitation tool for sophisticated users. In I. Bratko and N. Lavrac (Eds.), Progress in machine learning. England: Sigma Press.
Cochran, W. (1947). Some consequences when the assumptions for the Analysis of Variance are not satisfied.Biometrica 3, 22–38.
Hart, A. (1985a). The role of induction in knowledge elicitation. Expert Systems, 2, 24–28.
Hart, A. (1986). Knowledge acquisition for expert systems. London: Kogan Page.
Hunt, E., Marin, J., and Stone, P. (1966). Experiments in induction. New York: Academic Press.
Kendall, M., and Stewart, A. (1976). The advanced theory of statistics (Vol. 3). London: Griffen.
Kodratoff, Y., and Manago, M. (1987). Generalization and noise. International Journal of Man-Machine Studies 27, 181–204.
Konenko, I., Bratko, I., and Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules. (Technical report). Ljubljana, Yugoslavia: Jozef Stefan Institute.
Marshall, R. (1986). Partitioning methods for classification and decision making in medicine. Statistics in Medicine, 5, 517–526.
Michalski, R. S., and Chilausky, C. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4, 125–161.
Michalski, R. S., Carbonell, J., and Mitchell, T. (1983). Machine learning: An artificial intelligence approach. (Vol. 1). Los Altos: Morgan Kaurman.
Michalski, R. S., Carbonell, J., and Mitchell, T. (1983). Machine learning: An artificial intelligence approach. (Vol. 2). Los Altos: Morgan Kaufman.
Mingers, J. (1987a). Expert systems—rule induction with statistical data. Journal of the Operational Research Society, 38, 39–47.
Mingers, J. (1987b). Rule induction with statistical data—a comparison with multiple regression. Journal of the Operational Research Society, 38, 347–352.
Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319–342.
Niblett, T. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac (Eds.), Progress in machine learning. England: Sigma Press.
Quinlan, J. R. (1979). Discovering rules from large collections of examples: A case study. In D. Michie (Ed.), Expert systems in the micro electronic age. Edinburgh: Edinburgh University Press.
Quinlan, J. R. (1983). The effect of noise on concept learning. In R. S. Michalski, J. Carbonell, T. Mitchell (Eds.), Machine learning: An artificial intelligence approach. Los Altos: Morgan Kaufman.
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess and games. In R. S. Michalski, J. Carbonell, T. Mitchell (Eds.), Machine learning: An artificial intelligence approach. Los Altos: Morgan Kaufman.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1987b). Simplifying decision trees. International Journal of Man-Machine Studies, 27, 221–234.