Computing LTS Regression for Large Data Sets

Data Mining and Knowledge Discovery - Tập 12 Số 1 - Trang 29-45 - 2006
Peter J. Rousseeuw1, Katrien Van Driessen2
1Department of Mathematics and Computer Science, Universiteit Antwerpen, Antwerpen, Belgium
2Faculty of Applied Economics, Universiteit Antwerpen, Antwerpen, Belgium

Tóm tắt

Từ khóa


Tài liệu tham khảo

Agulló, J. 1997a. Computación de estimadores con alto punto de ruptura. Ph.D. Thesis, University of Alicante, Spain.

Agulló, J. 1997b. Exact algorithms to compute the least median of squares estimate in multiple linear regression. In L1-Statistical Procedures and Related Topics, Y. Dodge (ed.), The IMS Lecture Notes – Monograph Series, Volume 31, pp. 133–146.

Chork, C.J. 1990. Unmasking multivariate anomalous observations in exploration geochemical data from sheeted-vein tin mineralization near Emmaville, N.S.W., Australia. Journal of Geochemical Exploration, 37:191–203.

Coakley, C.W. and Hettmansperger, T.P. 1993. A bounded influence, high breakdown, efficient regression estimator. Journal of the American Statistical Association, 88:872–880.

Hawkins, D.M. 1994. The feasible solution algorithm for least trimmed squares regression. Computational Statistics and Data Analysis, 17:185–196.

Hawkins, D.M. and Olive, D.J. 1999. Improved feasible solution algorithms for high breakdown estimation. Computational Statistics and Data Analysis, 30:1–11.

Hössjer, O. 1994. Rank-based estimates in the linear model with high breakdown point. Journal of the American Statistical Association, 89:149–158.

Huang, Z. 1998. Extensions of the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2:283–304.

Kaufman, L. and Rousseeuw, P.J. 1986. Clustering large data sets. In Pattern Recognition in Practice II, E.S. Gelsema and L.N. Kanal (eds.) Elsevier/North-Holland, pp. 425–437.

Kaufman, L. and Rousseeuw, P.J. 1990. Finding Groups in Data, New York: John Wiley.

Meer, P., Mintz, D., Rosenfeld, A., and Kim, D. 1991. Robust regression methods in computer vision: a review. International Journal of Computer Vision, 6:59–70.

Mili, L., Phaniraj, V., and Rousseeuw, P.J. 1991. Least median of squares estimation in power systems (with discussion). IEEE Trans. on Power Systems, 6:511–523.

Mili, L., Cheniae, N.S., and Rousseeuw, P.J. 1996. Robust state estimation based on projection statistics (with discussion). IEEE Trans. on Power Systems, 11:1118–1127.

Ng, R.T. and Han, J., 1994. Efficient and effective clustering methods for spatial data mining. Proceedings of the International Conference on Very Large Data Bases (VLDB ’94), Santiago, Chile, September 1994, pp. 144–155.

Odewahn, S.C., Djorgovski, S.G., Brunner, R.J., and Gal, R. 1998. Data From the Digitized Palomar Sky Survey. Technical Report, California Institute of Technology.

Rousseeuw, P.J. 1984. Least median of squares regression. Journal of the American Statistical Association, 79:871–880.

Rousseeuw, P.J. 1985. Multivariate estimation with high breakdown point. In Mathematical Statistics and Applications, Vol B, W. Grossmann, G. Pflug, I. Vincze and W. Wertz (eds.) Dordrecht: Reidel, pp. 283–297.

Rousseeuw, P.J. 1997. Introduction to positive-breakdown methods. In Handbook of Statistics, Vol. 15: Robust Inference, G.S. Maddala and C.R. Rao (eds.) Amsterdam: Elsevier, pp. 101–121.

Rousseeuw, P.J. and Hubert, M. 1997. Recent developments in PROGRESS. In $${\rm L}_1$$ -Statistical Procedures and Related Topics, Y. Dodge (ed.) The IMS Lecture Notes – Monograph Series, Vol. 31, pp. 201–214.

Rousseeuw, P.J. and Leroy, A.M. 1987. Robust Regression and Outlier Detection, New York: John Wiley.

Rousseeuw, P.J. and Van Driessen, K. 1999. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41:212–223.

Rousseeuw, P.J. and van Zomeren, B.C., 1990. Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85:633–639.

Steele, J.M. and Steiger, W.L. 1986. Algorithms and complexity for least median of squares regression. Discrete Applied Mathematics, 14:93–100.

Stromberg, A.J. 1993. Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression. SIAM Journal of Scientific Computing, 14:1289–1299.

Simpson, D.G., Ruppert, D., and Carroll, R.J. 1992. On one-step GM-estimates and stability of inferences in linear regression. Journal of the American Statistical Association, 87:439–450.

Wang, C.M., Vecchia, D.F., Young, M. and Brilliant, N.A. 1997. Robust regression applied to optical fiber dimensional quality control. Technometrics, 39:25–33.

Woodruff, D.L. and Rocke, D.M. 1994. Computable robust estimation of multivariate location and shape in high dimension using compound estimators. Journal of the American Statistical Association, 89:888–896.

Yohai, V.J. 1987. High breakdown point and high efficiency robust estimates for regression. Annals of Statistics, 15:642–656.

Zhang, T., Ramakrishnan, R., and Livny, M. 1997. BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1:141–182.