Selection of relevant features and examples in machine learning

Artificial Intelligence - Tập 97 - Trang 245-271 - 1997
Avrim L. Blum1, Pat Langley2,3
1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-3891, USA
2Institute for the Study of Learning and Expertise, 2164 Staunton Court, Palo Alto, CA 94306, USA
3Intelligent Systems Laboratory, Daimler-Benz Research and Technology Center, 1510 Page Mill Road, Palo Alto, CA 94304, USA

Tài liệu tham khảo

Aha, 1990, A study of instance-based algorithms for supervised learning tasks: mathematical, empirical and psychological evaluations Aha, 1996, A comparative evaluation of sequential feature selection algorithms Almuallim, 1991, Learning with many irrelevant features, 547 Angluin, 1987, Learning regular sets from queries and counterexamples, Inform. and Comput., 75, 87, 10.1016/0890-5401(87)90052-6 Angluin, 1993, Learning read-once formulas with queries, J. ACM, 40, 185, 10.1145/138027.138061 Armstrong, 1993, Webwatcher: a learning apprentice for the World Wide Web Baluja, 1997, Dynamic relevance: vision-based focus of attention using artificial neural networks (Technical Note), Artificial Intelligence, 97, 381, 10.1016/S0004-3702(97)00065-9 Blum, 1992, Learning Boolean functions in an infinite attribute space, Machine Learning, 9, 373, 10.1007/BF00994112 Blum, 1995, Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain, 64 Blum, 1994, Weakly learning DNF and characterizing statistical query learning using Fourier analysis, 253 Blum, 1995, Learning in the presence of finitely or infinitely many irrelevant attributes, J. Comput. System Sci., 50, 32, 10.1006/jcss.1995.1004 Blum, 1993, Learning an intersection of k halfspaces over a uniform distribution, 312 Blumer, 1987, Occam's razor, Inform. Process. Lett., 24, 377, 10.1016/0020-0190(87)90114-1 Blumer, 1989, Learnability and the Vapnik-Chervonenkis dimension, J. ACM, 36, 929, 10.1145/76359.76371 Breiman, 1984 Bshouty, 1993, Exact learning via the monotone theory, 302 Cardie, 1993, Using decision trees to improve case-based learning, 25 Caruana, 1994, Greedy attribute selection, 28 Caruana, 1994, How useful is relevance?, 25 Catlett, 1992, Peepholing: choosing attributes efficiently for megainduction, 49 Cesa-Bianchi, 1993, How to use expert advice, 382 Clark, 1989, The CN2 induction algorithm, Machine Learning, 3, 261, 10.1007/BF00116835 Cohn, 1996, Active learning with statistical models, J. Artif. Intell. Research, 4, 129, 10.1613/jair.295 Comon, 1994, Independent component analysis: a new concept, Signal Process., 36, 287, 10.1016/0165-1684(94)90029-9 Cover, 1967, Nearest neighbor pattern classification, IEEE Trans. Inform. Theory, 13, 21, 10.1109/TIT.1967.1053964 Daelemans, 1994, The acquisition of stress: a data-oriented approach, Comput. Linguistics, 20, 421 Devijver, 1982 Dhagat, 1994, PAC learning with irrelevant attributes, 64 Doak, 1992, An evaluation of feature-selection methods and their application to computer security Drucker, 1992, Improving performance in neural networks using a boosting algorithm, Vol. 4 Drucker, 1994, Boosting and other machine learning algorithms, 53 Dyer, 1989, A random polynomial time algorithm for approximating the volume of convex bodies, 375 Freund, 1990, Boosting a weak learning algorithm by majority, 202 Freund, 1992, An improved boosting algorithm and its implications on learning complexity, 391 Garey, 1979 Gil, 1993, Efficient domain-independent experimentation, 128 Greiner, 1997, Knowing what doesn't matter: exploiting the omission of irrelevant data, Artificial Intelligence, 97, 345, 10.1016/S0004-3702(97)00048-9 Gross, 1991, Concept acquisition through attribute evolution and experiment selection Haussler, 1986, Quantifying the inductive bias in concept learning, 485 Holte, 1993, Very simple classification rules perform well on most commonly used domains, Machine Learning, 11, 63, 10.1023/A:1022631118932 Jackson, 1994, An efficient membership-query algorithm for learning DNF with respect to the uniform distribution John, 1994, Irrelevant features and the subset selection problem, 121 John, 1996, Static vs. dynamic sampling for data mining, 367 Jolliffe, 1986 Johnson, 1974, Approximation algorithms for combinatorial problems, J. Comput. System Sci., 9, 256, 10.1016/S0022-0000(74)80044-9 Kearns, 1994 Kira, 1992, A practical approach to feature selection, 249 Kivinen, 1995, Additive versus exponentiated gradient updates for linear prediction, 209 Kivinen, 1997, The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant (Technical Note), Artificial Intelligence, 97, 325, 10.1016/S0004-3702(97)00039-8 Kohavi, 1995, The power of decision tables Kohavi, 1997, Wrappers for feature subset selection, Artificial Intelligence, 97, 273, 10.1016/S0004-3702(97)00043-X Kohavi, 1997, The utility of feature weighting in nearest-neighbor algorithms Knobe, 1977, A method for inferring context-free grammars, Inform. and Control, 31, 129, 10.1016/S0019-9958(76)80003-4 Koller, 1996, Toward optimal feature selection Kononenko, 1994, Estimating attributes: analysis and extensions of RELIEF Kubat, 1993, Discovering patterns in EEG signals: comparative study of a few methods, 367 Kulkarni, 1990, Experimentation in machine discovery Langley, 1993, Average-case analysis of a nearest neighbor algorithm, 889 Langley, 1994, Oblivious decision trees and abstract cases, 113 Langley, 1994, Induction of selective Bayesian classifiers, 399 Langley, 1997, Scaling to domains with many irrelevant features, Vol. 4 Lewis, 1992, Representation and learning in information retrieval Lewis, 1992, Feature selection and feature extraction for text categorization, 212 Lewis, 1994, Heterogeneous uncertainty sampling, 148 Lin, 1992, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning, 8, 293, 10.1007/BF00992699 Littlestone, 1988, Learning quickly when irrelevant attributes abound: a new linear threshold algorithm, Machine Learning, 2, 285, 10.1007/BF00116827 Littlestone, 1991, On-line learning of linear functions, 465 Littlestone, 1997, An apobayesian relative of winnow, Vol. 9 Littlestone, 1994, The weighted majority algorithm, Inform. and Comput., 108, 212, 10.1006/inco.1994.1009 Lovász, 1992, On the randomized complexity of volume and diameter, 482 Lund, 1993, On the hardness of approximating minimization problems, 286 Matheus, 1989, Constructive induction on decision trees, 645 Michalski, 1980, Pattern recognition as rule-guided inductive inference, IEEE Trans. Pattern Anal. Machine Intell., 2, 349, 10.1109/TPAMI.1980.4767034 Minsky, 1969 Mitchell, 1982, Generalization as search, Artificial Intelligence, 18, 203, 10.1016/0004-3702(82)90040-6 Moore, 1994, Efficient algorithms for minimizing cross validation error, 190 Norton, 1989, Generating better decision trees, 800 Pazzani, 1992, A framework for the average case analysis of conjunctive learning algorithms, Machine Learning, 9, 349, 10.1007/BF00994111 Pagallo, 1990, Boolean feature discovery in empirical learning, Machine Learning, 5, 71, 10.1023/A:1022611825350 Quinlan, 1983, Learning efficient classification procedures and their application to chess end games Quinlan, 1993 Rajamoney, 1990, A computational approach to theory revision Rivest, 1993, Inference of finite automata using homing sequences, Inform. and Comput., 103, 299, 10.1006/inco.1993.1021 Rumelhart, 1986, Learning internal representations by error propagation, Vol. 1 Sammut, 1986, Learning concepts by asking questions, Vol. 2 Schapire, 1990, The strength of weak learnability, Machine Learning, 5, 197, 10.1007/BF00116037 Schlimmer, 1993, Efficiently inducing determinations: a complete and efficient search algorithm that uses optimal pruning, 284 Scott, 1991, Representation generation in an exploratory learning system Seung, 1992, Query by committee, 287 Shen, 1989, Rule creation and rule learning through environmental exploration, 675 Sinclair, 1989, Approximate counting, uniform generation and rapidly mixing Markov chains, Inform. and Comput., 82, 93, 10.1016/0890-5401(89)90067-9 Singh, 1995, A comparison of induction algorithms for selective and non-selective Bayesian classifiers, 497 Singh, 1996, Efficient learning of selective Bayesian network classifiers Skalak, 1994, Prototype and feature selection by sampling and random mutation hill-climbing algorithms, 293 Stanfill, 1987, Memory-based reasoning applied to English pronunciation, 577 Ting, 1994, Discretization of continuous-valued attributes and instance-based learning Townsend-Weber, 1994, Instance-based prediction of continuous values, 30 Verbeurgt, 1990, Learning DNF under the uniform distribution in polynomial time, 314 Vere, 1975, Induction of concepts in the predicate calculus, 281 Vovk, 1990, Aggregating strategies, 371 Widrow, 1960, Adaptive switching circuits, 96 Winston, 1975, Learning structural descriptions from examples