A robust fuzzy k-means clustering model for interval valued data

Computational Statistics - Tập 21 - Trang 251-269 - 2006
Pierpaolo D’Urso1, Paolo Giordani2
1Dipartimento di Scienze Economiche, Gestionali e Sociali, Università degli Studi del Molise, Campobasso, Italy
2Dipartimento di Statistica, Probabilità e Statistiche Applicate, Università di Roma “La Sapienza”, Rome, Italy

Tóm tắt

In this paper a robust fuzzy k-means clustering model for interval valued data is introduced. The peculiarity of the proposed model is the capability to manage anomalous interval valued data by reducing the effects of such outliers in the clustering model. In the interval case, the concept of anomalous data involves both the center and the width (the radius) of an interval. In order to show how our model works the results of a simulation experiment and an application to real interval valued data are discussed.

Tài liệu tham khảo

Beni, G., & Liu, X., (1994) A least biased fuzzy clustering method, IEEE Transactions on Pattern Recognition Analysis and Machine Intelligence, 16(9), 954–960. Bezdek, J.C., (1981) Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York. Bock, H.H., & Diday, E. (eds.) (2000) Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data, Springer-Verlag, Heidelberg. Chavent, M., (2000) Criterion-based divisive clustering for symbolic objects, in Analysis of Symbolic Data (eds. Bock, H.H., & Diday, E.), Springer-Verlag, Heidelberg. Chavent, M., & Lechevallier, Y., (2002) Dynamical clustering algorithm of interval data: optimization of an adequacy criterion based on Hausdorff distance, in: Classification, Clustering and Data Analysis (eds. Jajuga, K., Sokolowski, A. & Bock, H.H.,), Springer, Heidelberg, 53–59. Chavent, M., De Carvalho, F.A.T., Lechevallier, Y., & Verde, R., (2003) Trois nouvelles méthodes de classification automatique de données symboliques de type intervalle, Revue de Statistiques Appliquées, 51 (4), 5–29. Chavent, M., (2004) An Hausdorff distance beteen hyper-rectangles for clustering interval data, in: Classification, Clustering and Data Mining Applications (eds. Banks, D., House, L., McMorris, F. R., Arabie, P. & Gaul, W.), Springer, Heidelberg, 333–340. D’Urso, P., (2005) Fuzzy clustering for data time arrays with inlier and outlier time trajectories, IEEE Transactions on Fuzzy Systems, 13 (5), 583–604. D’Urso, P., & Giordani, P., (2004) A least squares approach to principal component analysis for interval valued data, Chemometrics and Intelligent Laboratory Systems, 70, 179–192. D’Urso, P., & Giordani, P., (2006) A weighted fuzzy c-means clustering model for fuzzy data, Computational Statistics & Data Analysis, 50, 1496–1523. de Carvalho, F.A.T., (1994) Proximity coefficients between Boolean symbolic objects, in New Approaches in Classification and Data Analysis (eds. Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., & Burtschy, B.), Springer, Heidelberg, 387–394. de Carvalho, F.A.T., Brito, F., & Bock, H.H., (2004) Dynamic clustering for interval data based on L 2 distance, Technical Report n. 0437, IAP Statistics Network. de Carvalho, F.A.T., & de Souza, R.M.C.R., (1998) New metrics for constrained Boolean symbolic objects, in: Proceedings of the conference on Knowledge Extraction and Symbolic Data Analysis (KESDA ’98). Office for Official Publications of the European Communities, Luxemburg, 175–187. de Carvalho, F.A.T., de Souza, R.M.C.R., Chavent, M., & Lechevallier, Y. (2006) Adaptive Hausdorff distances and dynamic clustering of symbolic interval data, Pattern Recognition Letters 27, 167–179. de Souza, R.M.C.R., & de Carvalho, F.A.T., (2004) Clustering of interval data based on city-block distances, Pattern Recognition Letters, 25, 353–365. Davè, R., (1991) Characterization and detection of noise in clustering, Pattern Recognition Letters, 12, 657–664. Davè, R., & Fu, T., (1994) Robust shape detection using fuzzy clustering: practical applications, Fuzzy Sets and Systems, 65, 161–185. Davè, R., & Krishnapuram, R., (1997) Robust clustering methods: an unified view, IEEE Transactions on Fuzzy Systems, 5 (2), 270–293. Davè, R., & Sen, S., (2002) Robust fuzzy clustering of relational data, IEEE Transactions on Fuzzy Systems, 10 (6), 713–727. Diday, E., & Brito, M.P., (1989) Symbolic cluster analysis, in Conceptual and Numerical Analysis of Data (ed. Opitz, O.), 45–84, Springer-Verlag, Heidelberg. El-Sonbaty, Y., & Ismail, M.A., (1998a) Fuzzy clustering for symbolic data, IEEE Transactions on Fuzzy Systems, 6 (2), 195–204. El-Sonbaty, Y., & Ismail, M.A., (1998b) On-line hierarchical clustering, Pattern recognition Letters, 19, 1285–1291. Frigui, H., & Krishnapuram, R., (1999) A robust competitive algorithm with applications in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21 (5), 450–465. Gordon, A.D., (2000) An iterative relocation algorithm for classifying symbolic data, in: Data Analysis: Scientific Modeling and Practical Application (eds. Gaul, W.; Opitz, O., Schader, M.,), 17–23, Springer-Verlag, Heidelberg. Gowda, K.C., & Diday, E., (1991) Symbolic clustering using a new dissimilarity measure, Pattern Recognition, 24 (6), 567–578. Gowda, K.C., & Diday, E., (1992) Symbolic clustering using a new similarity measure, IEEE Transactions on Systems, Man, and Cybernetics, 22, 368–378. Gowda, K.C., & Ravi, T.R., (1995a) Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity, Pattern Recognition, 28 (8), 1277–1282. Gowda, K.C., & Ravi, T.R., (1995b) Agglomerative clustering of symbolic objects using the concepts of both similarity and dissimilarity, Pattern recognition Letters, 16, 647–652. Gowda, K.C., & Ravi, T.R., (1999a) Clustering of symbolic objects using gravitational approach, IEEE Transactions on Systems, Man, and Cybernetics, 29 (6) 888–894. Gowda, K.C., & Ravi, T.R., (1999b) An ISODATA clustering procedure for symbolic objects using a distributed genetic algorithm, Pattern Recognition Letters, 20, 659–666. Guru, D.S., Kiranagi, B.B., & Nagabhushan, P., (2004) Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns, Pattern Recognition Letters, 25, 1203–1213. Hathaway, R.J., Bezdek, J.C., & Pedrycz, W., (1996) A parametric model for fusing heterogeneous fuzzy data, IEEE Transactions on Fuzzy Systems, 4 (3), 1277–1282. hathaway, R.J., Bezdek, J.C., & Hu, Y., (2000) Generalized fuzzy c-means clustering strategies using Lp norm distances, IEEE Transactions on Fuzzy Systems, 8 (5), 576–582. Ichino, M., & Yaguchi, H., (1994) Generalized Minkowsky metrics for mixed feature-type data analysis, IEEE Transaction of Systems, Man and Cybernetics, 24 (4), 698–708. Kersten, P.R., (1999) Fuzzy order statistics and their application to fuzzy clustering, IEEE Transactions on Fuzzy Systems, 7 (6), 708–712. Keller, A., (2000) Fuzzy clustering with outliers, in 19th Intern. Conf. of the North American Fuzzy Information Processing Society-NAFIPS “Peach FuzzAtlanta”, 143–147. Kim, J., Krishnapuram, R., & Davé, R.N., (1996) Application of the least trimmed squares techniques to prototype-based clustering, Pattern Recognition Letters, 17, 633–641. Krishnapuram, R., & Keller, J., (1993) A possibilisitc approach to clustering, IEEE Transactions on Fuzzy Systems, 1, 98–110. Krishnapuram, R., & Keller, J., (1996) The possibilistic c-means algorithm: insights and recommendations, IEEE Transactions on Fuzzy Systems, 4, 385–393. Leşki, J.M., (2003) Towards a robust fuzzy clustering, Fuzzy Sets and Systems, 137, 215–233. Mali, K., & Mitra, S., (2003) Clustering and its validation in a symbolic framework, Pattern Recognition Letters, 24, 2367–2376. Ohashi, Y., (1984) Fuzzy clustering and robust estimation, in 9th Meeting SAS Users Group Int., Hollywood Beach. Ralambondrainy, H., (1995) A conceptual version of the K-means algorithm, Pattern Recognition Letters, 16, 1147–1157. Selim, S.Z., & Ismail, M.A., (1984) Soft clustering of multidimensional data: a semi-fuzzy approach, Pattern Recognition, 17 (5), 559–568. yang, M.S., Hwang, P.Y., & Chen, D.H., (2004) Fuzzy clustering algorithms for mixed feature variables, Fuzzy Sets and Systems, 141, 301–317. Yang, M.S., & Ko, C.H., (1996) On a class of fuzzy c-numbers clustering procedures for fuzzy data, Fuzzy Sets and Systems, 84, 49–60. Yang, M.S., & Liu, H.H., (1999) Fuzzy clustering procedures for conical fuzzy vector data, Fuzzy Sets and Systems, 106, 189–200. Yang, M.S., & Liu, T.S., (2002) Fuzzy least-squares linear regression analysis for fuzzy input-output data, Fuzzy Sets and Systems, 126, 389–399. Zimmermann, H.J., (2001) Fuzzy Set Theory and its Applications, Kluwer Academic Press, Dordrecht.