Regularization Theory and Neural Networks Architectures

Neural Computation - Tập 7 Số 2 - Trang 219-269 - 1995
Federico Girosi1, Michael Jones1, Tomaso Poggio1
1Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA

Tóm tắt

We had previously shown that regularization principles lead to approximation schemes that are equivalent to networks with one layer of hidden units, called regularization networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known radial basis functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends radial basis functions (RBF) to hyper basis functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, some forms of projection pursuit regression, and several types of neural networks. We propose to use the term generalized regularization networks for this broad class of approximation schemes that follow from an extension of regularization. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In summary, different multilayer networks with one hidden layer, which we collectively call generalized regularization networks, correspond to different classes of priors and associated smoothness functionals in a classical regularization principle. Three broad classes are (1) radial basis functions that can be generalized to hyper basis functions, (2) some tensor product splines, and (3) additive splines that can be generalized to schemes of the type of ridge approximation, hinge functions, and several perceptron-like neural networks with one hidden layer.

Từ khóa


Tài liệu tham khảo

10.2307/1267500

10.2307/1990404

10.1017/S0140525X00021555

10.1109/18.256500

10.1007/BF00993164

10.1162/neco.1989.1.1.151

10.1109/5.5962

10.1162/neco.1992.4.6.888

10.1109/18.256506

Broomhead D. S., 1988, Complex Syst., 2, 321

10.1007/BF01890410

10.1214/aos/1176347115

10.1137/0721053

Craven P., 1979, Numer. Math., 31, 377403

10.1007/BF02551274

DeVore R. A., 1991, Approximation Theory, VI,C. K. Chui, L. L. Schumaker, and D. J. Ward, eds., 175

DeVore R. A., 1991, Approximation Theory, VI, C. K. Chui, L. L. Schumaker, and D. J. Ward, eds., 203

10.1214/aos/1176344949

10.1214/aos/1176346703

10.1214/aos/1176347004

Dyn N., 1991, Approximation Theory, VI, C. K. Chui, L. L. Schumaker, and D. J. Ward, eds., 211

10.1137/0907043

10.2307/2007474

10.2307/2287576

10.1016/0893-6080(89)90003-8

Gasser Th., 1985, Scand. J. Statist., 11, 171

10.1162/neco.1992.4.1.1

10.1016/0898-1221(92)90172-E

10.1007/BF00195855

10.2307/1268518

10.1098/rstb.1982.0088

10.2514/3.44330

10.1029/JB076i008p01905

10.1016/0898-1221(90)90272-L

10.1214/ss/1177013604

10.2307/2289439

10.1016/0893-6080(89)90020-8

10.1214/aos/1176349519

10.1126/science.239.4839.482

10.1109/ICNN.1988.23901

10.1214/aos/1176348546

10.1016/0898-1221(90)90270-T

10.1016/0898-1221(90)90271-K

Kimeldorf G. S., 1971, Ann. Math. Statist., 2, 495

10.1109/5.58325

10.1162/neco.1989.1.1.1

10.2307/2311185

10.2307/2008691

10.2307/2289127

10.1007/BF01601941

10.1162/neco.1992.4.4.502

10.1007/BF02070821

10.1016/0196-8858(92)90016-P

10.1007/BF01893414

10.1162/neco.1989.1.2.281

Moody J., 1991, Advances in Neural Information Processings Systems 4, J. Moody, S. Hanson, and R. Lippmann, eds., 1048

10.1137/1109020

Omohundro S., 1987, Complex Syst., 1, 273

10.1017/S0305004100030401

10.1007/BF02281970

10.1101/SQB.1990.055.01.084

10.1126/science.247.4945.978

10.1038/317314a0

10.1016/0885-064X(88)90024-6

Priestley M. B., 1972, J. R. Statist. SOC. B, 34, 385

10.1016/0898-1221(92)90177-J

10.1016/0005-1098(78)90005-5

10.1214/aoms/1177693050

10.1090/qam/15914

10.1016/0021-9045(69)90040-9

Sejnowski T. J., 1987, Complex Syst., 1, 145

10.1214/aos/1176346710

10.1109/TIT.1978.1055913

10.1109/72.97934

10.1216/RMJ-1976-6-3-409

10.1214/aos/1176349548

Tikhonov A. N., 1963, Soviet Math. Dokl., 4, 1035

10.1137/1116025

Vapnik V. N., 1981, Teor. Veroyat. Primen., 26, 543

Vapnik V. N., 1991, Pattern Recog. Image Anal., 1, 283

Vapnik V. N., 1978, Automat. Telemek., 8, 38

10.1007/BF01437407

10.1214/aos/1176349743

10.1080/03610927508827223

Watson G. S., 1964, Sankhya A, 26, 359

10.1162/neco.1989.1.4.425

10.1016/0893-6080(90)90004-5