A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization

Top - Tập 18 - Trang 377-395 - 2010
Pakize Taylan1,2, Gerhard-Wilhelm Weber1, Fatma Yerlikaya Özkurt1
1Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey
2Department of Mathematics, Dicle University, Diyarbakir, Turkey

Tóm tắt

This paper introduces a model-based approach to the important data mining tool Multivariate adaptive regression splines (MARS), which has originally been organized in a more model-free way. Indeed, MARS denotes a modern methodology from statistical learning which is important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. It is very useful for high-dimensional problems and shows a great promise for fitting nonlinear multivariate functions. The MARS algorithm for estimating the model function consists of two algorithms, these are the forward and the backward stepwise algorithm. In our paper, we propose not to use the backward stepwise algorithm. Instead, we construct a penalized residual sum of squares for MARS as a Tikhonov regularization problem which is also known as ridge regression. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and model-based alternative to the concept of the backward stepwise algorithm. In particular, we apply the elegant framework of conic quadratic programming. This is an area of convex optimization which is very well-structured, herewith, resembling linear programming and, hence, permitting the use of powerful interior point methods. Based on these theoretical and algorithmical studies, this paper also contains an application to diabetes data. We evaluate and compare the performance of the established MARS and our new CMARS in classifying diabetic persons, where CMARS turns out to be very competitive and promising.

Tài liệu tham khảo

American Diabetes Association (2008) Standards of medical care in diabetes. Clinical practice recommendations 2008. Diabetes Care 31(Suppl 1):S12–S54 Aster A, Borchers B, Thurber C (2005) Parameter estimation and inverse problems. Elsevier, New York Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html Breiman L, Friedman JH, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth Int Group, Belmont Büyükbebeci E (2009) Comparison of MARS, CMARS and CART in predicting default probabilities for emerging markets. Term Project at Institute of Applied Mathematics of METU, Ankara Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403 Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874 Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–141 Hansen PC (1998) Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion. SIAM, Philadelphia Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning. Springer, New York Işcanoğlu A, Weber G-W, Taylan P (2007) Predicting default probabilities with generalized additive models for emerging markets. Lecture at the Graduate Summer School on Recent Advances in Statistics, METU, Ankara, available via DIALOG. http://www.statsummer.com Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis. Prentice-Hall, Upper Saddle River Karbauskaitė R, Dzemyda G, Marcinkevičius V, Dependence of locally linear embedding on the regularization parameter. TOP, to appear. Selected paper of the 20th Mini EURO Conference, 2008, Neringa, Lithuania MARS from Salford Systems. http://www.salfordsystems.com/mars/phb. Accessed 25 Aug 2009 MATLAB Version 7.5 (R2007b) MOSEK, a commercial software for CQP. http://www.mosek.com. Accessed 25 Aug 2009 Nash G, Sofer A (1996) Linear and nonlinear programming. McGraw-Hill, New York Nemirovski A (2002) Lectures on modern convex optimization. Israel Institute Technology, available via DIALOG. http://iew3.technion.ac.il/Labs/Opt/opt/LN/Final.pdf Nesterov YE, Nemirovskii AS (1993) Interior point methods in convex programming. SIAM, Philadelphia Taylan P, Weber G-W, Beck A (2007) New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology. Optimization 56(5–6):1–24 Taylan P, Weber GW, Yerlikaya F (2008) Continuous optimization applied in MARS for modern applications in finance, science and technology. In: The ISI proceedings of 20th mini EURO conference continuous optimization and knowledge-based technologies, Neringa, Lithuania, pp 317–322 Weber G-W, Taylan P, Alparslan-G Z, Özöğür S, Akteke-Öztürk, B (2008) Optimization of gene-environment networks in the presence of errors and uncertainty with Chebychev approximation. Top 16(2):284–318 Weber G-W, Taylan P, Yıldırak K, Görgülü ZK (2010) Financial regression and organization. Dyn Contin Discrete Impuls Syst Ser B, Appl Algorithms 17:149–174 Yerlikaya F (2008) A new contribution to nonlinear robust regression and classification with MARS and its application to data mining for quality control in manufacturing. MSc Thesis at Institute of Applied Mathematics of METU, Ankara