Selective Sampling for Nearest Neighbor Classifiers
Tóm tắt
Most existing inductive learning algorithms work under the assumption that their training examples are already tagged. There are domains, however, where the tagging procedure requires significant computation resources or manual labor. In such cases, it may be beneficial for the learner to be active, intelligently selecting the examples for labeling with the goal of reducing the labeling cost. In this paper we present LSS—a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers. The algorithm is looking for the example with the highest utility, taking its effect on the resulting classifier into account. Computing the expected utility of an example requires estimating the probability of its possible labels. We propose to use the random field model for this estimation. The LSS algorithm was evaluated empirically on seven real and artificial data sets, and its performance was compared to other selective sampling algorithms. The experiments show that the proposed algorithm outperforms other methods in terms of average error rate and stability.
Tài liệu tham khảo
Adler, R. J. (1981). The Geometry of Random Fields. John Wiley & Sons.
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6:1, 37–66.
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2:3, 319–342.
Blake, C., Keogh, E., & Merz, C. (1998). UCI Repository of machine learning databases, University of California, Irvine. [http://www.ics.uci.edu/~mlearn/MLRepository.html].
Cohn, D. A., Atlas, L., & Lander, R. (1994). Improving generalization with active learning. Machine Learning, 15:2, 201–221.
Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1995). Active learning with statistical models. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in Neural Information Processing Systems, vol. 7 (pp. 705–712). The MIT Press.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13:1, 21–27.
Dagan, I., & Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. In A. Prieditis, & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 150–157). Morgan Kaufmann.
Davis, D. T., & Hwang, J.-N. (1992). Attentional focus training by boundary region data selection. In Proceedings of International Joint Conference on Neural Networks, vol. 1 (pp. 676–681). IEEE Press.
DeGroot, M. H. (1986). Probability and Statistics, 2nd ed. Addison-Wesley.
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley-Interscience.
Eldar, Y., Lindenbaum, M., Porat, M., & Zeevi, Y. Y. (1997). The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing, 6:9, 1305–1315.
Fedorov, V. V. (1972). Theory of Optimal Experiments. New York: Academic Press. Translation of Teoriia Optimalnogo Eksperimenta.
Freund, Y., Seung, H. S., Shamir, E., & Tishbi, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28:2/3, 133–168.
Hasenjager, M., & Ritter, H. (1996). Active learning of the generalized high-low-game. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, & B. Sendhoff (Eds.), Proceedings of the International Conference on Artificial Neural Networks, vol. 1112 of Lecture Notes in Computer Science (pp. xxv+922, 501–506). Springer-Verlag.
Hasenjager, M., & Ritter, H. (1998). Active learning with local models. Neural Processing Letters, 7:2, 107–117.
Lang, K. J., & Witbrock, M. J. (1988). Learning to tell two spirals apart. In D. Touretzky, G. Hinton, & T. Sejnowski (Eds.), Proceedings of the Connectionist Models Summer School (pp. 52–59). Morgan Kaufmann.
Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In W. W. Cohen, & H. Hirsh (Eds.), Proceedings of the Eleventh International Conference on Machine Learning (pp. 148–156). New Brunswick, NJ: Rutgers University.
Lindenbaum, M., Markovitch, S., & Rusakov, D. (1999). Selective sampling for nearest neighbor classifiers. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (pp. 366–371). Orlando, Florida: AAAI Press.
MacKay, D. J. (1998). Introduction to gaussian processes. Neural Networks and Machine Learning. NATO ASI Series. Series F, Computer and System Sciences, 168, 133–166.
MacKay, D. J. C. (1992). Information-based objective functions for active data selection. Neural Computation, 4:4, 590–604.
Moore, A. W., Schneider, J. G., Boyan, J. A., & Lee, M. S. (1998). Q2: Memory-based active learning for optimizing noisy continuous functions. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 386–394). Morgan Kaufmann.
Murthy, S., & Salzberg, S. (1995). Lookahead and pathology in decision tree induction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1025–1031).
Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-Hill.
RayChaudhuri, T., & Hamey, L. (1995). Minimisation of data collection by active learning. In IEEE International Conference on Neural Networks Proceedings, vol. 3 (pp. 6 vol. 1+3219, 1338–1341). IEEE Press.
Russell, S. J., & Norvig, P. (1995). Artificial Intelligence, A Modern Approach. Prentice Hall.
Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. v+452, 287–294). New York: ACM.
Smyth, B., & McKenna, E. (1999). Building compact competent case-bases. In Lecture Notes in Artificial Intelligence, number 1650 in Lecture Notes in Computer Science (pp. 329–342). Springer.
Tan, M., & Schlimmer, J. C. (1990). Two case studies in cost-sensitive concept acquisition. In Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 854–860). AAAI Press.
Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2, 369–409.
Williams, C. K. I., & Barber, D. (1998). Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:12, 1342.
Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38:3, 257–286.
Wong, E., & Hajek, B. (1985). Stochastic Processes in Engineering Systems. Springer-Verlag.
Zhang, J., Yim, Y.-S., & Yang, J. (1997). Intelligent selection of instances for prediction functions in lazy learning algorithms. Artificial Intelligence Review, 11:1–5, 175–191.