Wrappers for feature subset selection

Artificial Intelligence - Tập 97 - Trang 273-324 - 1997
Ron Kohavi1, George H. John2
1Data Mining and Visualization, Silicon Graphics, Inc., 2011 N. Shoreline Boulevard, Mountain View, CA 94043, USA
2Epiphany Marketing Software, 2141 Landings Drive, Mountain View, CA 94043, USA

Tài liệu tham khảo

Aha, 1992, Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms, Internat. J. Man-Machine Studies, 36, 267, 10.1016/0020-7373(92)90018-G Aha, 1994, Feature selection for case-based classification of cloud types: an empirical comparison, 106 Aha, 1995, A comparative evaluation of sequential feature selection algorithms, 1 Aha, 1991, Instance-based learning algorithms, Machine Learning, 6, 37, 10.1007/BF00153759 Almuallim, 1991, Learning with many irrelevant features, 547 Almuallim, 1994, Learning Boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69, 279, 10.1016/0004-3702(94)90084-1 Anderson, 1992, Explorations of an incremental, Bayesian algorithm for categorization, Machine Learning, 9, 275, 10.1007/BF00994109 Atkeson, 1991, Using locally weighted regression for robot learning, 958 Bala, 1995, Hybrid learning using genetic algorithms and decision trees for pattern classification, 719 Ben-Bassat, 1982, Use of distance measures, information measures and error bounds in feature evaluation, Vol. 2, 773 Berliner, 1979, The B∗ tree search algorithm: a best-first proof procedure, Artificial Intelligence, 12, 23, 10.1016/0004-3702(79)90003-1 1981, 79 Blum, 1992, Training a 3-node neural network is NP-complete, Neural Networks, 5, 117, 10.1016/S0893-6080(05)80010-3 Boddy, 1989, Solving time-dependent planning problems, 979 Brazdil, 1994, Characterizing the applicability of classification algorithms using meta-level learning Breiman, 1996, Bagging predictors, Machine Learning, 24, 123, 10.1007/BF00058655 Breiman, 1984 Buntine, 1992, Learning classification trees, Statist. and Comput., 2, 63, 10.1007/BF01889584 Cardie, 1993, Using decision trees to improve case-based learning, 25 Caruana, 1994, Greedy attribute selection, 28 Cestnik, 1990, Estimating probabilities: a crucial task in machine learning, 147 Cover, 1977, On the possible orderings in the measurement selection problem, IEEE Trans. Systems Man Cybernet., 7, 657, 10.1109/TSMC.1977.4309803 Dasarathy, 1990 De Mántaras, 1991, A distance-based attribute selection measure for decision tree induction, Machine Learning, 6, 81, 10.1023/A:1022694001379 Devijver, 1982 Doak, 1992, An evaluation of feature selection methods and their application to computer security Domingos, 1996, Beyond independence: conditions for the optimality of the simple Bayesian classifier, 105 Dougherty, 1995, Supervised and unsupervised discretization of continuous features, 194 Draper, 1981 Duda, 1973 Fayyad, 1991, On the induction of decision trees for multiple concept learning Fayyad, 1992, The attribute selection problem in decision tree generation, 104 Fong, 1995, A quantitative study of hypothesis selection, 226 Freund, 1990, Boosting a weak learning algorithm by majority, 202 also: Inform. and Comput., to appear. Freund, 1995, A decision-theoretic generalization of on-line learning and an application to boosting, 23 Furnival, 1974, Regression by leaps and bounds, Technometrics, 16, 499, 10.1080/00401706.1974.10489231 Geman, 1992, Neural networks and the bias/variance dilemma, Neural Comput., 1, 10.1162/neco.1992.4.1.1 Gennari, 1989, Models of incremental concept formation, Artificial Intelligence, 40, 11, 10.1016/0004-3702(89)90046-5 Ginsberg, 1993 Goldberg, 1989 Good, 1965 Greiner, 1992, Probabilistic hill climbing: theory and applications, 60 Hancock, 1989 Hoeffding, 1963, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58, 13, 10.1080/01621459.1963.10500830 Holland, 1992 Hyafil, 1976, Constructing optimal binary decision trees is NP-complete, Inform. Process. Lett., 5, 15, 10.1016/0020-0190(76)90095-8 John, 1997, Enhancements to the data mining process John, 1994, Irrelevant features and the subset selection problem, 121 Judd, 1988, On the complexity of loading shallow neural networks, J. Complexity, 4, 177, 10.1016/0885-064X(88)90019-2 Kaelbling, 1993 Kira, 1992, The feature selection problem: Traditional methods and a new algorithm, 129 Kira, 1992, A practical approach to feature selection Kittler, 1978, Une généralisation de quelques algorithms sous-optimaux de recherche d'ensembles d'attributs Kittler, 1986, 59 Kohavi, 1994, Feature subset selection as search with probabilistic estimates, 122 Kohavi, 1995, The power of decision tables, Vol. 914, 174 Kohavi, 1995, A study of cross-validation and bootstrap for accuracy estimation and model selection, 1137 Kohavi, 1995, Wrappers for performance enhancement and oblivious decision graphs Kohavi, 1994, Useful feature subsets and rough set reducts, 310 also: in: Soft Computing by Lin and Wildberger. Kohavi, 1995, Automatic parameter selection by minimizing estimated error, 304 Kohavi, 1995, Feature subset selection using the wrapper model: overfilling and dynamic search space topology, 192 Kohavi, 1996, Bias plus variance decomposilion for zero-one loss funclions, 275 Kohavi, 1996, Data mining using MLC++: A machine learning library in C++, 234 Kononenko, 1994, Estimating attributes: analysis and extensions of Relief Kononenko, 1995, On biases in estimating multi-valued attributes, 1034 Koza, 1992 Krogh, 1995, Neural network ensembles, cross validation, and active learning, Vol. 7 Kwok, 1990, Multiple decision trees, 327 Laarhoven, 1987 Langley, 1994, Selection of relevant features in machine learning, 140 Langley, 1994, Sage, Induction of selective Bayesian classifiers, 399 Langley, 1994, Oblivious decision trees and abslracl cases, 113 Langley, 1992, An analysis of Bayesian classifiers, 223 Linhart, 1986 Littlestone, 1994, The weighted majority algorithm, Inform. and Comput., 108, 212, 10.1006/inco.1994.1009 Mallows, 1973, Some comments on cp, Technometrics, 15, 661 Marill, 1963, On the effectiveness of receptors in recognition systems, IEEE Trans. Inform. Theory, 9, 11, 10.1109/TIT.1963.1057810 Maron, 1994, Hoeffding races: accelerating model selection search for classification and function approximation, Vol. 6 Merz, 1996 Miller, 1984, Selection of subsets of regression variables, J. Roy. Statist. Soc. A, 147, 389, 10.2307/2981576 Miller, 1990 Minsky, 1988 Mladenić, 1995, Automated model selection Modrzejewski, 1993, Feature selection using rough sets theory, 213 Moore, 1994, Efficient algorithms for minimizing cross validation error Moret, 1982, Decision trees and diagrams, ACM Comput. Surveys, 14, 593, 10.1145/356893.356898 Murthy, 1995, Lookahead and pathology in decision tree induction, 1025 Narendra, 1977, A branch and bound algorithm for feature subset selection, IEEE Trans. Comput., 26, 917, 10.1109/TC.1977.1674939 Neter, 1990 Pawlak, 1991 Pawlak, 1993, Rough sets: present state and the future, Found. Comput. Decision Sci., 18, 157 Pazzani, 1995, Searching for dependencies in Bayesian classifiers Perrone, 1993, Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization Provan, 1995, Learning Bayesian networks using feature selection, 450 Provost, 1992, Policies for the selection of bias in inductive machine learning Provost, 1995, Inductive policy: the pragmatics of bias selection, Machine Learning, 20, 35, 10.1007/BF00993474 Quinlan, 1986, Induction of decision trees, Machine Learning, 1, 81, 10.1007/BF00116251 Quinlan, 1993 Quinlan, 1995, Oversearching and layered search in empirical learning, 1019 Rendell, 1990, Learning hard concepts through constructive induction: Framework and rationale, Comput. Intell., 6, 247, 10.1111/j.1467-8640.1990.tb00298.x Rosenblatt, 1958, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review, 65, 386, 10.1037/h0042519 Russell, 1995 Schaffer, 1993, Selecting a classification method by cross-validation, Machine Learning, 13, 135, 10.1007/BF00993106 Schapire, 1990, The strength of weak learnability, Machine Learning, 5, 197, 10.1007/BF00116037 Siedlecki, 1988, On automatic feature selection, Internat. J. Pattern Recognition and Artificial Intelligence, 2, 197, 10.1142/S0218001488000145 Singh, 1995, A comparison of induction algorithms for selective and non-selective Bayesian classifiers, 497 Skalak, 1994, Prototype and feature selection by sampling and random mutation hill climbing algorithms Street, 1995, An inductive learning approach to prognostic prediction Taylor, 1994 Thrun, 1991, The Monk's problems: a performance comparison of different learning algorithms Turney, 1993, Exploiting context when learning to classify, 402 Turney, 1996, The identification of context-sensitive features, a formal definition of context for concept learning, 53 Utgoff, 1994, An improved algorithm for incremental induction of decision trees, 318 Utgoff, 1995, Decision tree induction based on efficient tree restructuring Vafai, 1992, Genetic algorithms as a tool for feature selection in machine learning, 200 Vafai, 1993, Robust feature selection algorithms, 356 Wolpert, 1992, On the connection between in-sample testing and generalization error, Complex Systems, 6, 47 Wolpert, 1992, Stacked generalization, Neural Networks, 5, 241, 10.1016/S0893-6080(05)80023-1 Xu, 1989, Best first strategy for feature selection, 706 Yan, 1992, Stochastic discrete optimization, SIAM J. Control and Optimization, 30, 594, 10.1137/0330034 Yu, 1993, A more efficient branch and bound algorithm for feature selection, Pattern Recognition, 26, 883, 10.1016/0031-3203(93)90054-Z Ziarko, 1991, The discovery, analysis and representation of data dependencies in databases