A robust proposal of estimation for the sufficient dimension reduction problem

TEST - Tập 30 - Trang 758-783 - 2021
Andrea Bergesio1, María Eugenia Szretter Noste2, Víctor J. Yohai2,3,4
1Departamento de Matemática, Facultad de Ingeniería Química, Universidad Nacional del Litoral, Santa Fe, Argentina
2Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
3Departamento de Matemática, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
4Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina

Tóm tắt

In nonparametric regression contexts, when the number of covariables is large, we face the curse of dimensionality. One way to deal with this problem when the sample is not large enough is using a reduced number of linear combinations of the explanatory variables that contain most of the information about the response variable. This leads to the so-called sufficient reduction problem. The purpose of this paper is to obtain robust estimators of a sufficient dimension reduction, that is, estimators which are not very much affected by the presence of a small fraction of outliers in the data. One way to derive a sufficient dimension reduction is by means of the principal fitted components (PFC) model. We obtain robust estimations for the parameters of this model and the corresponding sufficient dimension reduction based on a $$\tau $$ -scale ( $$\tau $$ -estimators). Strong consistency of these estimators under weak assumptions of the underlying distribution is proven. The $$\tau $$ -estimators for the PFC model are computed using an iterative algorithm. A Monte Carlo study compares the performance of $$\tau $$ -estimators and maximum likelihood estimators. The results show clear advantages for $$\tau $$ -estimators in the presence of outlier contamination and only small loss of efficiency when outliers are absent. A proposal to select the dimension of the reduction space based on cross-validation is given. These estimators are implemented in R language through functions contained in the package tauPFC. As the PFC model is a special case of multivariate reduced-rank regression, our proposal can be applied directly to this model as well.

Tài liệu tham khảo

Adrover JG, Donato SM (2015) A robust predictive approach for canonical correlation analysis. J Multivar Anal 133:356–376 Anderson TW (1951) Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann Math Stat 22(3):327–351 Bergesio A, Szretter Noste ME, Yohai VJ (2020) tauPFC: computes robust estimators for the PFC model. R package version 0.0.1. https://github.com/meszre/tauPFC Boente G, Fraiman R (1989) Robust nonparametric regression estimation for dependent observations. Ann Stat 17(3):1242–1256 Boente G, Martínez A (2017) Marginal integration m-estimators for additive models. TEST 26(2):231–260 Bura E, Cook RD (2001) Estimating the structural dimension of regressions via parametric inverse regression. J R Stat Soc Ser B (Stat Methodol) 63(2):393–410 Bura E, Cook RD (2003) Rank estimation in reduced-rank regression. J Multivar Anal 87(1):159–176 Bura E, Forzani L (2015) Sufficient reductions in regressions with elliptically contoured inverse predictors. J Am Stat Assoc 110(509):420–434 Bura E, Yang J (2011) Dimension estimation in sufficient dimension reduction: a unifying approach. J Multivar Anal 102(1):130–142 Bura E, Duarte S, Forzani L (2016) Sufficient reductions in regressions with exponential family inverse predictors. J Am Stat Assoc 111(515):1313–1329 Cook RD (2007) Fisher lecture: dimension reduction in regression. Stat Sci 22(1):1–26 Cook RD, Forzani L (2008) Principal fitted components for dimension reduction in regression. Stat Sci 23(4):485–501 Cook RD, Ni L (2005) Sufficient dimension reduction via inverse regression. J Am Stat Assoc 100(470):410–428 Cook RD, Weisberg S (1991) Comment. J Am Stat Assoc 86(414):328–332 Cook RD, Li B, Chiaromonte F (2010) Envelope models for parsimonious and efficient multivariate linear regression. Stat Sin 20:927–960 Cook RD, Forzani L, Tomassi D (2011) Ldr: a package for likelihood-based sufficient dimension reduction. J Stat Softw 39(1):1–20 Filzmoser P, Dehon C, Croux C (2000) Outlier resistant estimators for canonical correlation analysis. In: COMPSTAT, Springer, pp 301–306 García Ben M, Martínez E, Yohai VJ (2006) Robust estimation for the multivariate linear model based on a \(\tau \)-scale. J Multivar Anal 97(7):1600–1622 Gather U, Hilker T, Becker C (2001) A robustified version of sliced inverse regression. In: Statistics in genetics and in the environmental sciences, Springer, pp 147–157 Hampel FR (1971) A general qualitative definition of robustness. Ann Math Stat 42(6):1887–1896 Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn, Springer, New York, Huber PJ (1981) Robust statistics. Wiley, New York Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Anal 5(2):248–264 Li K-C (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327 Li K-C (1992) On principal hessian directions for data visualization and dimension reduction: another application of stein’s lemma. J Am Stat Assoc 87(420):1025–1039 Li B, Wang S (2007) On directional regression for dimension reduction. J Am Stat Assoc 102(479):997–1008 Li B, Zha H, Chiaromonte F (2005) Contour regression: a general approach to dimension reduction. Ann Stat 33(4):1580–1616 Li B, Artemiou A, Li L (2011) Principal support vector machines for linear and nonlinear sufficient dimension reduction. Ann Stat 39(6):3182–3210 Lopuhaä HP (1991) Multivariate \(\tau \)-estimators for location and scatter. Can J Stat 19(3):307–321 Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Conceicao ELT, Anna di Palma M (2020) Robustbase: basic robust statistics. R package version 0.93-6 Muler N, Yohai VJ (2002) Robust estimates for arch processes. J Time Ser Anal 23(3):341–375 Papantoni-Kazakos P, Gray RM (1979) Robustness of estimators on stationary observations. Ann Probab 7(6):989–1002 R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria Reinsel GC, Velu RP (1998) Multivariate reduced-rank regression: theory and applications. Springer, Berlin Salibian-Barrera M, Yohai VJ (2006) A fast algorithm for s-regression estimates. J Comput Gr Stat 15(2):414–427 Scrucca L (2011) Model-based sir for dimension reduction. Comput Stat Data Anal 55(11):3010–3026 She Y, Chen K (2017) Robust reduced-rank regression. Biometrika 104(3):633–647 Szretter Noste ME (2019) Using dags to identify the sufficient dimension reduction in the principal fitted components model. Stat Probab Lett 145:317–320 Tatsuoka KS, Tyler DE (2000) On the uniqueness of s-functionals and m-functionals under nonelliptical distributions. Ann Stat 28(4):1219–1243 Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47 Tyler DE (1987) A distribution-free m-estimator of multivariate scatter. Ann Stat 15:234–251 Weisberg S (2005) Applied linear regression, vol 528. Wiley, New York Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656 Yohai VJ, Zamar RH (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83(402):406–413 Yohai VJ, Zamar RH (1997) Optimal locally robust m-estimates of regression. J Stat Plan Inference 64(2):309–323 Zhao W, Lian H, Ma S (2017) Robust reduced-rank modeling via rank regression. J Stat Plan Inference 180:1–12 Zhou J (2009) Robust dimension reduction based on canonical correlation. J Multivar Anal 100(1):195–209