Learning from data: a tutorial with emphasis on modern pattern recognition methods

IEEE Sensors Journal - Tập 2 Số 3 - Trang 203-217 - 2002
M. Pardo1, G. Sberveglieri1
1Department of Chemistry and Physics, University of Brescia, Brescia, Italy

Tóm tắt

The purposes of this tutorial are twofold. First, it reviews the classical statistical learning scenario by highlighting its fundamental taxonomies and its key aspects. The second aim of the paper is to introduce some modern (ensembles) methods developed inside the machine learning field. The tutorial starts by putting the topic of supervised learning into the broader context of data analysis and by reviewing the classical pattern recognition methods: those based on class-conditional density estimation and the use of the Bayes theorem and those based on discriminant functions. The fundamental topic of complexity control is treated in some detail. Ensembles techniques have drawn considerable attention in recent years: a set of learning machines increases classification accuracy with respect to a single machine. Here, we introduce boosting, in which classifiers adaptively concentrate on the harder examples located near to the classification boundary and output coding, where a set of independent two-class machines solves a multiclass problem. The first successful applications of these methods to data produced by the Pico-2 electronic nose (EN), developed at the University of Brescia, Brescia, Italy, are also briefly shown.

Từ khóa

#Tutorial #Pattern recognition #Sensor arrays #Data analysis #Electronic noses #Principal component analysis #Chemical sensors #Machine learning #Neural networks #Sensor phenomena and characterization

Tài liệu tham khảo

10.1007/BF00116037 10.1007/3-540-45014-9_1 10.1007/BF00058655 freund, 1996, experiments with a new boosting algorithm, Proc 13th Int Conf Mach Learn, 148 schapire, 1999, a brief introduction to boosting, 16th Int Joint Conf Artificial Intell, 1401 10.1016/S0893-6080(96)00098-6 10.1109/34.58871 cherkauker, 1996, human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks, Working Notes AAAI Workshop Integrating Multiple Learned Models (IMLM 96), 15 10.1162/neco.1995.7.5.867 10.1162/neco.1994.6.2.181 dietterich, 1997, machine learning research: four current directions, Artificial Intell Mag, 18, 97 10.1214/aos/1024691079 10.1142/9789812795885_0025 vapnik, 1998, Statistical Learning Theory 10.1007/978-1-4757-2440-0 dietterich, 1997, Fundamental experimental research in machine learning breiman, 1984, Classification and Regression Trees duda, 2000, Pattern Classification and Scene Analysis merz, 1998, UCI repository of machine learning databases quinlan, 1993, C4 5 Programs for Machine Learning 10.1214/aos/1024691352 michie, 1997, Machine Learning Neural and Statistical Classification moreira, 1998, improved pairwise coupling classifiers with correcting classifiers, Tenth Eur Conf Machine Learning 10.1016/S0003-2670(01)00936-9 demuth, 1998, Manual of the Neural Network Toolbox 10.1109/IJCNN.2000.857870 masulli, 2000, effectiveness of error correcting output codes in multiclass learning problems, Multiple Classifier Syst First Int Workshop MCS 2000, 107 10.1109/34.824819 bishop, 1995, Neural Networks for Pattern Recognition 10.1006/inco.1995.1136 10.1063/1.1144830 bishop, 1996, neural networks: a pattern recognition perspective, Handbook of Neural Computation, 10.1201/9781420050646.ptb6 10.1017/CBO9780511812651 haykin, 1999, Neural Networks A Comprehensive Foundation sarle, 1995, stopped training and other remedies for overfitting, Proc 27th Symp on Interface Comput Sci Statist, 352 10.1162/neco.1992.4.3.415 foresee, 1997, gauss–newton approximation to bayesian regularization, Proc 1991 IEEE Int Joint Conf Neural Networks, 1930 kohonen, 1997, Self-Organizing Maps, 10.1007/978-3-642-97966-8 pardo, 2001, boosting electronic noses, Proc 8th Int Symp Olfaction Electron Nose pardo, 2000, a general framework for learning from data and an application to three electronic nose datasets, 7th Int Symp Olfaction Electron Nose webb, 1999, Statistical Pattern Recognition cherkassky, 1998, Learning From Data tukey, 1977, Exploratory Data Analysis 10.2307/2981969 dietterich, 1995, solving multiclass learning problems via error-correcting output codes, J Artif Intell Res, 263, 10.1613/jair.105 fukunaga, 1990, Introduction to statistical pattern recognition 10.1023/A:1007607513941 opitz, 1999, popular ensemble methods: an empirical study, Artif Intell J, 11, 169, 10.1613/jair.614 10.1109/72.363444 10.1145/307400.307419 freund, 0, Slides on boosting 10.1006/jcss.1997.1504 schapire, 1997, using output codes to boost multiclass learning problems, Proc Fourteenth Int Conf Machine Learning, 313 quinlan, 1996, bagging, boosting and c4.5, Proc Thirteenth Nat Conf Artificial Intell, 725