A review of instance selection methods

Artificial Intelligence Review - Tập 34 - Trang 133-143 - 2010
J. Arturo Olvera-López1, J. Ariel Carrasco-Ochoa2, J. Francisco Martínez-Trinidad2, Josef Kittler3
1Benemérita Universidad Autónoma de puebla, Facultad de Ciencias de la Computación, Puebla, Mexico
2National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Puebla, Mexico
3University of Surrey, Center for Vision, Speech and Signal Processing, Guilford, UK

Tóm tắt

In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.

Tài liệu tham khảo

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66 Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Hybrid Intell Syst 16(12): 1445–1473 Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2): 153–172 Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97: 245–271 Caises Y, González A, Leyva E, Pérez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. In: Corchado E, Yin H (eds) IDEAL 2009, LNCS 5788. Burgos, Spain, pp 17–24 Cano JR, Herrera F, Lozano M (2005) Stratification for scaling up evolutionary prototype selection. Pattern Recognit Lett 26: 953–963 Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7(6): 561–575 Cerverón V, Ferri FJ (2001) Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbour rule. IEEE Trans Syst Man Cybern B 31(3): 408–413 Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559 Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13: 21–27 De Haro-García A, García-Pedrajas N (2009) A divide-and-conquer approach for scaling up instance sele ction algorithm. Data Min Knowl Discov 18: 392–418 Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Los Alamitos, CA, pp 72–80 Friedman JH, Bentley JL, Finkel RA (1997) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3): 209–226 Garain U (2008) Prototype reduction using an artificial immune model. Pattern Anal Appl 11: 353–363 García S, Cano JR, Herera F (2008) A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit 41: 2693–2709 Glover F (1986) The general employee scheduling problem: an integration of management science and artificial intelligence. Comput Oper Res 13(4): 563–593 Grochowski M, Jankowski N et al (2004) Comparison of instance selection algorithms II. In: Results , comments. Rutkowski L (eds) ICAISC 2004, LNAI. Zacopane, Poland, pp 580–585 Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14: 515–516 Ke-Ping Z, Shui-Geng Z, Ji-Hong G, Ao-Ying A (2003) C-Pruner: An improved instance pruning algorithm. In: Proceedings of 2nd IEEE international conference on machine learning and cybernetics, vol 1. pp 94–99 Kittler J (1986) Feature selection and extraction. In: Young TY, Fu KS (eds) Handbook of pattern recognition and image processing. Academic Press, New York, pp 203–217 Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16: 809–814 Kuncheva LI (1997) Fitness functions in editing k-NN referent set by genetic algorithms. Pattern Recognit 30: 1041–1049 Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search?. IEEE Trans Syst Man Cybern C 28(1): 160–164 Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6: 115–130 Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recognit 39: 495–497 Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recognit 35: 2771–2782 Narayan BL, Murthy CA, Pal SK (2006) Maxdiff kd-trees for data condensation. Pattern Recognit Lett 27: 187–200 Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2005) Sequential search for decremental edition. In: Gallagher M, Hogan J, Maire F (eds) LNCS 3578: IDEAL 2005. Queensland, Australia, pp 280–285 Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA (2007a) Restricted sequential floating search applied to object selection. In: Perner P (eds) MLDM 2007:LNAI 4571. Leipzig, Germany, pp 694–702 Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF et al (2007) Object selection based on clustering and border objects. In: Kurzynski M (eds) Computer recognition systems 2, ASC 45. Wroclaw, Poland, pp 27–34 Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF (2008) Prototype selection via prototype relevance. In: Ruiz-Shulcloper J, Kropatsch WG (eds) CIARP 2008, LNCS 5197. Habana, Cuba, pp 153–160 Olvera-López JA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Kittler J (2009) Prototype selection based on sequeintial search. Intell Data Anal 13(4): 599–631 Paredes R, Vidal E (2000) Weighting prototypes. A new editing approach. In: Proceedings of the international conference on pattern recognition ICPR, vol. 2. pp 25–28 Pudil P, Ferri FJ, Novovicová J, Kittler J (1994) Floating search methods for feature selection with nonmonotonic criterion functions. In: Proceedings of the 12th international conference on pattern recognition. IEEE Computer Society Press, pp 279–283 Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10): 1554–1567 Ritter GL, Woodruff HB, Lowry SR, Isenhour TL (1975) An algorithm for a selective nearest neighbor decision rule. IEEE Trans Inf Theory 21(6): 665–669 Riquelme JC, Aguilar-Ruíz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recognit 36: 1009–1018 Srisawat A, Phienthrakul T, Kijsirikul B (2006) SV-kNNC: an algorithm for improving the efficency of k-Nearest neighbr. In: Yang Q, Webb G (eds) PRICAI 2006:LNAI 4099. Guilin, China, pp 975–979 Spillmann B, Neuhaus M, Bunke H, Pȩkalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR&SPR 2006, LNCS 4109. Hong-Kong, pp. 287–296 Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6-6: 448–452 Vapnik V (1995) The nature of statistical learning theory. Springer, New York Vázquez F, Sánchez S, Pla F et al (2005) A stochastic approach to Wilson’s editing algorithm. In: Marques JS (eds) IbPRIA 2005, LNCS 3523. Estoril, Portugal, pp 35–42 Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9): 1417–1429 Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9): 1273–1280 Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2: 408–421 Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286 Yuangui L, Zhonhui H, Yunze C, Weidong Z et al (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L (eds) ICNC 2005, LNCS 3610. Changsha, China, pp 528–535 Zhang H, Sun G (2002) Optimal reference subset selection for nearest neighbor classification by tabu search. Pattern Recognit 35: 1481–1490