Cluster-based least absolute deviation regression for dimension reduction

Journal of Statistical Theory and Practice - Tập 10 - Trang 121-132 - 2016
Yuexiao Dong1, Chaozheng Yang1
1Department of Statistics, Temple University, Philadelphia, USA

Tóm tắt

Least absolute deviation (LAD) regression is an important alternative to ordinary least squares (OLS) regression in linear models. A surprising result in Li and Duan (1989) showed that OLS can be used for dimension reduction in single-index models as long as the predictor distribution satisfies a global linear conditional mean assumption. The proposal in Li and Duan (1989) has two limitations. First, it is well known that OLS is sensitive to outliers and fails in the case of heavy-tailed error distribution. Second, the global linearity assumption for the predictor distribution can be violated when there is a nonlinear relationship among the predictors. To address these limitations, cluster-based LAD for dimension reduction is proposed in this article. By inheriting the benefit of LAD over OLS in linear models, our proposal becomes more robust to outliers or heavy-tailed error distribution. We also replace the global linearity assumption with the more flexible local linearity assumption through k-means clustering.

Tài liệu tham khảo

Brillinger, D. R. 1983. A generalized linear model with “Gaussian” regressor variables. In A festschrift for Erich L. Lehmann, ed. P. J. Bickel, K. A. Doksum, and J. L. Hodges, 97–114. Belmont, CA: Woodsworth International Group. Cook, R. D. 1998. Regression graphics: Ideas for studying regressions through graphics. New York, NY: Wiley. Cook, R. D., and Li, B. 2002. Dimension reduction for the conditional mean in regression. Annals of Statistics 30:455–74. Härdle, W., P. Hall, and H. Ichimura. 1993. Optimal smoothing in single-index models. Annals of Statistics 21:157–58. Hartigan, J. 1975. Clustering algorithms. New York, NY: Wiley. Ichimura, H. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics 58:71–120. Koenker, R., and G. Bassett. 1978. Regression quantiles. Econometrica 46:33–50. Li, K. C., and N. Duan. 1989. Regression analysis under link violation. Annals of Statistics 17:1009–52. Li, L., R. D. Cook, and C. Nachtsheim. 2004. Cluster-based estimation for sufficient dimension reduction. Computational Statistics & Data Analysis 47:175–93. Narula, S. C., and J. F. Wellington. 1982. The minimum sum of absolute errors regression: A state-of-the-art survey. International Statistical Review 50:317–26. Portnoy, S., and R. Koenker. 1997. The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statistical Science 12:279–300. Wu, T., K. Yu, and Y. Yu. 2010. Single-index quantile regression. Journal of Multivariate Analysis 101:1607–21. Yu, K., and M. C. Jones. 1998. Local linear quantile regression. Journal of the American Statistical Association 93:228–37.