Learning pattern classification-a survey

IEEE Transactions on Information Theory - Tập 44 Số 6 - Trang 2178-2206 - 1998
Sanjeev R. Kulkarni1, Gábor Lugosi2, Santosh S. Venkatesh3
1Department of Electrical Engineering, Princeton University, Princeton, NJ, USA
2Department of Economics, Pompeu Fabra University, Barcelona, Spain
3Department of Electrical Engineering, University of Pennsylvania, Philadelphia, PA, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

vapnik, 1991, The Nature of Statistical Learning Theory

vapnik, 1982, Estimation of Dependencies Based on Empirical Data

vapnik, 1979, <emph>theory of pattern recognition.</emph> moscow, ussr: nauka, 1974, in russian; german translation, Theorie der Zeichenerkennung

10.1137/1116025

10.1109/TIT.1974.1055260

10.1016/B978-0-444-87877-9.50041-8

10.1109/18.79905

10.1145/1968.1972

10.1109/T-C.1970.222972

10.1109/72.485681

10.1109/72.661120

10.1023/A:1022627018023

krzyz˙ak, 1984, almost everywhere convergence of recursive kernel regression function estimates, IEEE Trans Informat Theory, it 31, 91, 10.1109/TIT.1984.1056833

10.2307/2528418

10.1109/18.567668

10.1016/S0893-6080(05)80131-5

10.1007/978-3-642-20212-4

10.1016/0893-6080(88)90029-9

10.1109/SFCS.1988.21932

beck, 1979, the exponential rate of convergence of error for <formula><tex>$k_n-nn$</tex></formula> nonparametric regression and decision, Probl Contr Inform Theory, 8, 303

beakley, 1972, distribution-free pattern verification using statistically equivalent blocks, IEEE Transactions on Computers, c 21, 1337, 10.1109/T-C.1972.223505

bashkirov, 1964, potential function algorithms for pattern recognition learning machines, Autom Remote Contr, 25, 692

10.1162/089976698300017016

10.1109/18.661502

10.1109/18.86996

10.1162/neco.1989.1.1.151

10.1214/aos/1176343886

10.1109/72.80287

10.1214/aos/1176346711

10.1162/neco.1990.2.2.248

10.1214/aop/1176988847

10.1016/0885-064X(88)90020-9

wang, 1995, A theory of generalization in learning machines with neural network applications

10.1109/TIT.1971.1054698

10.1109/18.737520

vidyasagar, 1997, A Theory of Learning and Generalization

10.1080/01621459.1988.10478652

10.1016/0885-064X(91)90030-2

10.1007/BF00114777

10.1016/0885-064X(91)90040-5

10.1007/978-1-4615-2696-4_6

10.1016/0022-0000(93)90003-F

10.1109/18.481777

10.1109/18.135650

10.1109/18.382014

10.1214/aos/1032894460

10.1016/0167-7152(94)00207-O

mack, 1981, local properties of <formula><tex>$k$</tex></formula>&ndash;nearest neighbor regression estimates, SIAM J Algeb Discr Methods, 2, 311, 10.1137/0602035

macintyre, 1993, finiteness results for sigmoidal neural networks, Proc 25th Annu ACM Symp Theory of Computing, 325

10.1145/167088.167193

lunts, 1967, evaluation of attributes obtained in statistical decision rules, Eng Cybern, 3, 98

barron, 1996, risk bounds for model selection via penalization, Probab Theory Related Fields

barron, 1988, statistical learning networks: a unifying view, Proc 20th Symp Interface Computing Science and Statistics, 192

10.1016/0031-3203(86)90013-0

barron, 1975, learning networks improve computer-aided prediction and control, Comp Des, 75, 65

baldi, 1988, on properties of networks of neuron-like elements, Neural Information Processing Systems

10.1007/978-94-011-3222-0_42

10.1109/CDC.1989.70117

10.1007/978-94-011-3222-0_42

barron, 1991, Universal approximation bounds for superpositions of a sigmoidal function

10.1007/BF00993164

10.1137/1126059

10.1109/18.256500

10.1016/B978-0-12-741252-8.50015-7

breiman, 1984, Classification and Regression Trees

wasan, 1969, Stochastic Approximation

broomhead, 1988, multivariable functional interpolation and adaptive networks, Complex Syst, 2, 321

10.1109/TPAMI.1984.4767546

10.1016/0012-365X(81)90274-0

watson, 1964, Smooth regression analysis, 26, 359

white, 1991, nonparametric estimation of conditional quantiles using neural networks, Proc 23rd Symp Interface Computing Science and Statistics, 190

10.1016/0893-6080(90)90004-5

10.1109/TSSC.1969.300267

widrow, 1960, adaptive switching circuits, IRE Wescon Conv Rec, 96

10.1016/S0893-6080(09)80018-X

10.1016/0893-6080(89)90020-8

10.1109/TPAMI.1987.4767957

10.2139/ssrn.20534

10.1145/238061.238067

10.1073/pnas.81.10.3088

wang, 0, optimal stopping and effective machine complexity in learning, IEEE Trans Informat Theory, 10.1023/A:1013737224969

10.1073/pnas.79.8.2554

wang, 1993, when to stop: on optimal stopping and effective machine size in learning, Conf Neural Information Processing Systems

10.1109/TSSC.1970.300339

hertz, 1991, Introduction to the Theory of Neural Computation

10.1016/0893-6080(89)90018-X

10.1137/0304010

10.1016/0167-9473(91)90103-9

wang, 1994, machine size selection for optimal generalisation, Work Applications of Descriptional Complexity to Inductive Statistical and Visual Inference

10.1109/34.88569

10.1016/0047-259X(87)90106-0

10.1007/978-1-4612-2856-1_21

10.1214/aos/1176348789

10.1109/9.489276

10.1109/9.489276

10.1137/0403015

10.1016/0304-3975(91)90026-X

kolmogorov, 1961, <formula><tex>$\epsilon$</tex></formula>-entropy and <formula><tex>$\epsilon$</tex></formula>-capacity of sets in functional spaces, Amer Math Soc Transl, 17, 277

10.1006/jcss.1997.1479

kohonen, 1988, Self-Organization and Associative Memory, 10.1007/978-3-662-00784-6

kearns, 1994, Introduction to Computational Learning Theory, 10.7551/mitpress/3897.001.0001

10.1145/267460.267491

kearns, 1995, an experimental and theoretical comparison of model selection methods, Proc 8th Annu ACM Work Computational Learning Theory, 21

10.1006/jcss.1997.1477

10.1007/BF02579150

aizerman, 1970, extrapolative problems in automatic control and the method of potential functions, Amer Math Soc Transl, 87, 281

aizerman, 1964, the method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points, Automat Remote Contr, 25, 1546

10.1109/TAC.1974.1100705

10.1007/BF02900741

aleksander, 1990, An Introduction to Neural Computing

10.1109/TIT.1974.1055306

10.1214/aop/1176993141

breiman, 1996, Bias variance and arcing classifiers

10.1109/5.58342

judd, 1990, Neural Network Design and the Complexity of Learning, 10.7551/mitpress/4932.001.0001

10.1109/5.58324

braverman, 1965, the method of potential functions, Automat Remote Contr, 26, 2130

10.1145/130385.130401

10.1007/BF00058655

braverman, 1966, estimation of the rate of convergence of algorithms based on the potential function method, Automat Remote Contr, 27, 80

10.1214/aop/1176993668

bhattacharya, 1987, weak convergence of <formula><tex>$k-nn$</tex></formula> density and regression estimators with varying <formula><tex>$k$</tex></formula> and applications, Ann Statist, 15, 976, 10.1214/aos/1176350487

10.1145/76359.76371

10.1016/S0893-6080(05)80010-3

darken, 1993, rate of approximation results motivated by robust neural network learning, Proc 6th ACM Work Computational Learning Theory, 303

dantzig, 1963, Linear Programming and Extensions

10.1007/BF02551274

cover, 1975, topics in statistical pattern recognition, Commun and Cybern, 10, 15

10.1007/978-1-4615-2696-4_10

10.1109/TIT.1979.1056099

das gupta, 1964, Nonparametric classification rules, 26, 25

dasarathy, 1991, Nearest Neighbor Pattern Classification Techniques

devijver, 1982, Pattern Recognition a Statistical Approach

devroye, 1978, a universal <formula><tex>$k$</tex></formula>-nearest neighbor procedure in discrimination, Proc 1978 IEEE Computer Society Conf Pattern Recognition and Image Processing, 142

10.1016/S0166-4115(08)60913-9

10.1007/BFb0097428

collomb, 1979, estimation de la regression par la me&acute;thode des <formula><tex>$k$</tex></formula> points les plus proches: proprie&acute;te&acute;s de convergence ponctuelle, C R l Acade&#x00B4 mie des Sciences de Paris, 289, 245

10.2307/1403039

10.1007/BF00994018

10.1109/PGEC.1965.264137

cover, 1968, rates of convergence of nearest neighbor decision procedures, Proc 1st Annu Hawaii Conf Systems Theory, 413

cover, 1968, capacity problems for linear machines, Pattern Recognition, 283

10.1016/0047-259X(87)90105-9

10.1016/B978-1-4832-3093-1.50012-2

yang, 1998, an asymptotic property of model selection criteria, IEEE Trans Informat Theory

10.1109/TIT.1967.1053964

10.1137/1135057

10.1016/0047-259X(89)90027-4

10.1016/S0893-6080(09)80011-7

minsky, 1988, Perceptrons

mizoguchi, 1977, piecewise linear discriminant functions in pattern recognition, Syst -Comp -Contr, 8, 114

mclachlan, 1992, Discriminant Analysis and Statistical Pattern Recognition, 10.1002/0471725293

10.1145/267460.267488

meisel, 1973, a partitioning algorithm with application in pattern classification and the optimization of decision trees, IEEE Transactions on Computers, c 22, 93, 10.1109/T-C.1973.223603

michel-briand, 1994, Asymptotic behavior of the AID method

10.1109/TIT.1979.1056032

devroye, 1976, Nonparametric discrimination and density estimation

10.1214/aop/1176990746

10.1007/BF02478259

10.1109/TIT.1976.1055604

10.1109/TIT.1987.1057328

10.1016/0031-3203(94)00141-8

10.1016/0378-3758(89)90040-2

devroye, 1996, A Probabilistic Theory of Pattern Recognition, 10.1007/978-1-4612-0711-5

drucker, 1996, boosting decision trees, Advances in Neural Information Processing Systems 8, 148

10.1214/aop/1176995384

10.1214/aos/1176344949

duda, 1973, Pattern Classification and Scene Analysis

10.1007/BF00531618

10.1214/aos/1176345647

10.1016/0047-259X(82)90083-5

10.1109/TPAMI.1982.4767222

10.1109/TPAMI.1981.4767052

10.1214/aos/1176325633

10.1109/34.3915

10.1007/BF02564701

devroye, 1983, distribution-free exponential bound on the <formula><tex>$l_1$</tex></formula> error of partitioning estimates of a regression function, Proc 4th Pannonian Symp Mathematical Statistics, 67

devroye, 1985, Nonparametric Density Estimation The $L_1$ View

10.1109/18.556602

10.1007/BFb0099432

10.1016/0001-8708(79)90047-1

10.1109/18.605573

10.1137/1109020

10.1137/1115015

10.1162/neco.1989.1.2.281

10.1080/01621459.1963.10500855

nilsson, 1990, The Mathematical Foundations of Learning Machines

10.1214/aos/1032526958

natarajan, 1991, Machine Learning A Theoretical Approach

10.1016/0893-6080(88)90028-7

10.1016/0031-3203(90)90086-Z

olshen, 1977, comments on a paper by c. j. stone, Ann Statist, 5, 632

parrondo, 1993, Vapnik&#x2013 Chervonenkis bounds for generalization, 26, 2211

10.1214/aoms/1177704472

patrick, 1966, Distribution-free minimum conditional risk learning systems

patrick, 1967, Introduction to the performance of distribution-free conditional risk learning systems

10.1109/TC.1977.1674938

10.1109/TSMC.1986.289283

10.1162/neco.1989.1.2.161

10.1145/48014.63140

10.1109/5.58326

10.1214/ss/1177012394

pollard, 1984, Convergence of Stochastic Processes, 10.1007/978-1-4612-5254-2

quinlan, 1996, bagging, boosting, and c4.5, Proc 13th Nat Conf Artificial Intelligence, 725

quinlan, 1993, C4 5 Programs for Machine Learning

10.1016/0031-3203(83)90076-6

10.1214/aoms/1177698425

10.1109/18.335893

powell, 1987, radial basis functions for multivariable interpolation: a review, Algorithms for Approximation

pollard, 1990, Empirical Processes Theory and Applications, 10.1214/cbms/1462061091

10.1016/0031-3203(78)90029-8

10.1080/01621459.1972.10482413

10.1214/aoms/1177696909

10.1109/34.67645

10.1109/18.21221

10.1145/168304.168377

10.1016/0047-259X(84)90022-8

10.1214/aos/1176346815

10.1109/TIT.1987.1057309

10.1214/aos/1176344197

10.1016/0047-259X(80)90074-3

10.1142/0822

10.1214/aos/1176344196

10.1007/978-1-4899-3099-6_2

ripley, 1994, neural networks and related methods for classification, J Roy Statist Soc, 56, 409

rudin, 1974, Real and Complex Analysis

10.1007/978-1-4615-2696-4

10.1214/aoms/1177728190

rosenblatt, 1962, Principles of Neurodynamics

royall, 1966, A class of nonparametric estimators of a smooth regression function

10.1016/0031-3203(80)90029-1

10.1109/TIT.1978.1055898

10.1109/72.165594

gyo¨rfi, 1978, an upperbound on the asymptotic error probability of the <formula><tex>$k$</tex></formula>-nearest neighbor rule, IEEE Trans Informat Theory, it 24, 512, 10.1109/TIT.1978.1055900

gyo¨rfi, 1975, on the nonparametric estimate of a posteriori probabilities of simple statistical hypotheses, Colloquia Mathematica Societatis Ja&#x00B4 nos Bolyai Topics in Information Theory, 299

10.1016/0167-8655(86)90054-1

10.1016/0890-5401(92)90010-D

haussler, 1988, predicting <formula><tex>$\{0,\,1\}$</tex></formula> functions from randomly drawn points, Proc 29th IEEE Symp Foundations of Computer Science, 100, 10.1109/SFCS.1988.21928

haykin, 1994, Neural Networks

hebb, 1949, The Organization of Behavior

aizerman, 1964, the probability problem of pattern recognition learning and the method of potential functions, Automat Remote Contr, 25, 1307

hegedu¨s, 1993, on training simple neural networks and small-weight neurons, Proc 1st Euro Conf Computational Learning Theory

aizerman, 1964, theoretical foundations of the potential function method in pattern recognition learning, Automat Remote Contr, 25, 917

10.1109/T-C.1969.222728

sakurai, 1993, tighter bounds of the vc-dimension of three-layer networks, Proc WCNN, 3, 540

sauer, 1972, On the density of families of sets, 13, 145

10.1007/BF00116037

schapire, 1997, Boosting the margin A new explanation for the effectiveness of voting methods

rumelhart, 1986, Parallel Distributed Processing, 1, 10.7551/mitpress/5236.001.0001

sethi, 1991, decision tree performance enhancement using an artificial neural network interpretation, Artificial Neural Networks and Statistical Pattern Recognition Old and New Connections, 71, 10.1016/B978-0-444-88740-5.50010-4

sebestyen, 1962, Decision-Making Processes in Pattern Recognition

10.1007/3-540-59119-2_184

10.1007/978-3-0348-4118-4

10.1109/TPAMI.1982.4767278

10.1109/18.243433

erde´lyi, 1956, Asymptotic Expansions

10.1016/0890-5401(89)90002-3

10.2307/2288636

10.1214/aos/1176344552

10.1063/1.36270

eeckman, 1988, the sigmoid nonlinearity in prepyriform cortex, Neural Information Processing Systems, 242

10.1109/18.335898

fix, 1951, Discriminatory analysis&#x2014 Nonparametric discrimination Consistency properties Project 21-49-004, 261

fix, 1952, Discriminatory analysis&#x2014 Nonparametric discrimination Small sample performance Project 21-49-004, 280

10.1111/j.1469-1809.1936.tb02137.x

10.1016/0166-218X(93)90179-R

shawe-taylor, 1996, a framework for structural risk minimization, Proc 9th Annu Conf Computational Learning Theory, 68

10.1016/0020-0190(91)90109-U

10.1145/168304.168385

10.1109/18.705570

10.1109/TIT.1981.1056403

10.1109/TPAMI.1980.4766988

10.1137/S0097539793259185

anderson, 1994, Introduction to Practical Neural Modeling

snapp, 1998, asymptotic expansions of the <formula><tex>$k$</tex></formula> nearest neighbour risk, Ann Statist, 10.1214/aos/1024691080

anderson, 1966, some nonparametric multivariate procedures based on statistically equivalent blocks, Multivariate Analysis, 5

snapp, 1997, Asymptotic derivation of the finite-sample risk of the $k$ -nearest neighbor classifier

10.1109/TIT.1970.1054532

anthony, 1992, Computational Learning Theory

10.1016/0166-218X(93)90126-9

10.2139/ssrn.37876

10.1109/TIT.1973.1055049

10.1007/978-1-4612-4782-1

10.1109/TPAMI.1982.4767195

10.1109/TPAMI.1984.4767523

10.1109/72.182701

bailey, 1978, a note on distance-weighted <formula><tex>$k$</tex></formula>-nearest neighbor rules, IEEE Trans Syst Man and Cybern, smc 8, 311

10.1109/TPAMI.1987.4767875

10.1109/TC.1977.1674849

10.1006/inco.1995.1136

fukunaga, 1972, Introduction to statistical pattern recognition

10.1109/TIT.1975.1055443

10.1016/0893-6080(89)90003-8

garey, 1979, Computers and Intractability A Guide to the Theory of NP-Completeness

gelfand, 1991, on tree structured classifiers, Artificial Neural Networks and Statistical Pattern Recognition Old and New Connections, 71

10.1109/ICSMC.1989.71407

10.1016/0022-0000(92)90039-L

10.1109/PGEC.1967.264667

10.1109/72.80210

10.1214/aos/1176344950

stengle, 1989, some new vapnik&ndash;chervonenkis classes, Ann Statist, 17, 1441, 10.1214/aos/1176347373

10.2307/2281538

stone, 1974, cross-validatory choice and assessment of statistical predictions, J Roy Statist Soc, 36, 111