A review of clustering techniques and developments
Tóm tắt
Từ khóa
Tài liệu tham khảo
Duda, 2001
Zhang, 2014, Cross-validation based weights and structure determination of Chebyshev-polynomial neural networks for pattern classification, Pattern Recognit., 47, 3414, 10.1016/j.patcog.2014.04.026
Nakayama, 1998, Pattern classification by linear goal programming and its extensions, J. Global Optim., 12, 111, 10.1023/A:1008244409770
C.M. Bishop, Pattern Recognition and Machine Learning, Springer, Berlin. ISBN 978-0-387-31073-2.
Zhang, 2002, Neural networks for classification: a survey, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 30, 451, 10.1109/5326.897072
Zhang, 2011, Data-core-based fuzzy min–max neural network for pattern classification, IEEE Trans. Neural Netw., 22, 2339, 10.1109/TNN.2011.2175748
Jiang, 2003, Constructing and training feed-forward neural net- works for pattern classification, Pattern Recognit., 36, 853, 10.1016/S0031-3203(02)00087-0
Ou, 2007, Multi-class pattern classification using neural networks, Pattern Recognit., 40, 4, 10.1016/j.patcog.2006.04.041
Paola, 1995, A detailed comparison of back propagation neural network and maximum-likelihood classifiers for urban land use classification, IEEE Trans. Geosci. Remote Sens., 33, 981, 10.1109/36.406684
Rumelhart, 1986
Zhou, 1999, Verification of the nonparametric characteristics of back-propagation neural networks for image classification, IEEE Trans. Geosci. Remote Sens., 37, 771, 10.1109/36.752193
Jaeger, 1999, Supervised fuzzy classification of SAR data using multiple sources, IEEE Int. Geosci. Remote Sens. Symp.
Marzano, 2007, Supervised fuzzy-logic classification of hydrometeors using C-band weather radars, IEEE Trans. Geosci. Remote Sens., 45, 3784, 10.1109/TGRS.2007.903399
B. Xue, 2013, Particle swarm optimization for feature selection in classification: a multi-objective approach, IEEE Trans. Cybern., 43, 1656, 10.1109/TSMCB.2012.2227469
Saxena, 2008, Novel approach for the use of small world theory in particle swarm optimization
Pawlak, 1991
Dalai, 2013, Rough-set-based feature selection and classification for power quality sensing device employing correlation techniques, IEEE Sens. J., 13, 563, 10.1109/JSEN.2012.2219144
Farida, 2014, Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks, Expert Syst. Appl., 41, 1937, 10.1016/j.eswa.2013.08.089
Han, 2011
Rokach, 2005, Clustering methods, 331
Saxena, 2010, Evolutionary methods for unsupervised feature selection using Sammon's stress function, Fuzzy Inf. Eng., 2, 229, 10.1007/s12543-010-0047-4
Jain, 2010, Data clustering: 50 years beyond k-means, Pattern Recognit. Lett., 31, 651, 10.1016/j.patrec.2009.09.011
Merriam-Webster Online Dictionary, 2008
Castro, 2000, A fast and robust general purpose clustering algorithm
Fraley, 1998
Sneath, 1973
King, 1967, Step-wise clustering procedures, J. Am. Stat. Assoc., 69, 86, 10.1080/01621459.1967.10482890
Ward, 1963, Hierarchical grouping to optimize an objective function, J. Am. Stat. Assoc., 58, 236, 10.1080/01621459.1963.10500845
Murtagh, 1984, A survey of recent advances in hierarchical clustering algorithms which use cluster centers, Comput. J., 26, 354, 10.1093/comjnl/26.4.354
Nagpal, 2013, Review based on data clustering algorithms
Periklis, 2002
Guha, 1998
George, 1999, Chameleon: a hierarchical clustering algorithm using dynamic modeling, IEEE Comput., 32, 68, 10.1109/2.781637
Lam, 2014, Clustering, academic press library in signal processing, Signal Process. Theory Mach. Learn., 1, 1115
MacQueen, 1967, Some methods for classification and analysis of multivariate observations, vol. 1, 281
Gersho, 1992
Dunn, 1973, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybern., 3, 32, 10.1080/01969727308546046
Bezdek, 1981
Yager, 1994, Approximate clustering via the mountain method, IEEE Trans. Syst. Man Cybern. Part B Cybern., 24, 1279, 10.1109/21.299710
Gath, 1989, Unsupervised optimal fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell., 11, 773, 10.1109/34.192473
Hathaway, 2000, Generalized fuzzy c-means clustering strategies using Lp norm distances, IEEE Trans. Fuzzy Syst., 8, 576, 10.1109/91.873580
Krishnapuram, 1993, A possibilistic approach to clustering, IEEE Trans. Fuzzy Syst., 1, 98, 10.1109/91.227387
Zahn, 1971, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput., C-20, 68, 10.1109/T-C.1971.223083
Urquhart, 1982, Graph-theoretical clustering based on limited neighborhood sets, Pattern Recognit., 15, 173, 10.1016/0031-3203(82)90069-3
Fisher, 1987, Knowledge acquisition via incremental conceptual clustering, Mach. Learn., 2, 139, 10.1007/BF00114265
Haykin, 1999
Xu, 2005, Survey of clustering algorithms, IEEE Trans. Neural Netw., 16, 645, 10.1109/TNN.2005.845141
Xu., 2010, Clustering algorithms in biomedical research: a review, IEEE Rev. Biomed. Eng., 3, 120, 10.1109/RBME.2010.2083647
McLachlan, 1997
J.D. Banfield and A.E. Raftery, Model-based Gaussian and non-Gaussian clustering Biometrics, vol. 49, no. 3, pp. 803–821, 1993.
Ester, 1996, A density-based algorithm for discovering clusters in large spatial databases with noise
Cheeseman, 1996, Bayesian classification (AutoClass): theory and results, 153
Wallace, 1994, Intrinsic classification by MML-the snob program, 37
Wang, 1997, STING: a statistical information grid approach to spatial data mining, 86
Sheikholeslami, 2000, WaveCluster: a wavelet-based clustering approach for spatial data in very large databases, Int. J. Very Large Data Bases, 8, 289, 10.1007/s007780050009
Agrawal, 1998, Automatic subspace clustering of high dimensional data for data mining applications, 94
Schwefel, 1981
Fogel, 1965
Holland, 1975
Goldberg, 1989
Kennedy, 2001, Swarm Intelligence
Kennedy, 1995, Particle swarm optimization, 1942
Dorigoand, 2004
Glover, 1986, Future paths for integer programming and links to artificial intelligence, Comput. Oper. Res., 5, 533, 10.1016/0305-0548(86)90048-1
Al. Sultan, 1995, A tabu search approach to clustering problem, Pattern Recognit., 28, 1443, 10.1016/0031-3203(95)00022-R
Pedrycz, 2002, Collaborative fuzzy clustering, Pattern Recognit. Lett., 23, 1675, 10.1016/S0167-8655(02)00130-7
Coletta, 2012, Collaborative fuzzy clustering algorithms: some refinements and design guidelines, IEEE Trans. Fuzzy Syst., 20, 444, 10.1109/TFUZZ.2011.2175400
Pedrycz, 2008, Collaborative clustering with the use of fuzzy c-means and its quantification, Fuzzy Sets Syst., 159, 2399, 10.1016/j.fss.2007.12.030
Pedrycz, 2005
Prasad, 2013, Vertical collaborative fuzzy c-means for multiple EEG data sets, 8102, 246
Pizzuti, 2009, 859
Gregory, 2008, A fast algorithm to find overlapping communities in networks, 408
Ahn, 2010, Link communities reveal multi-scale complexity in networks, Nature, 466, 761, 10.1038/nature09182
Forestier, 2010, Collaborative clustering with back ground knowledge, Data Knowl. Eng., 69, 211, 10.1016/j.datak.2009.10.004
Handl, 2007, An evolutionary approach to multiobjective clustering, IEEE Trans. Evolut. Comput., 11, 56, 10.1109/TEVC.2006.877146
Konak, 2006, Multiobjective optimization using genetic algorithms: a tutorial, Reliab. Eng. Syst. Saf., 91, 992, 10.1016/j.ress.2005.11.018
Faceili, 2006, Multiobjective clustering ensemble
Law, 2004, Multiobjective data clustering, IEEE Conf. Comp. Vis. Pattern Recognit., 2, 424
Forsyth, 2002
Consortium, 2001, Initial sequencing and analysis of the human genome, Nature, 409, 860, 10.1038/35057062
Dorai, 1995, Shape spectra based view grouping for free form object, 3, 240
Connell, 1998, Learning prototypes for on-line handwritten digits, 1, 182
Rasmussen, 1992, 419
McKiernan, 1990
Hedberg, 1996, Searching for the mother lode: tales of the first data miners, IEEE Expert Intell. Syst. Appl., 11, 4
Cohen, 1996
Saxena, 2010, Dimensionality reduction with unsupervised feature selection and applying non-Euclidean norms for classification accuracy, Int. J. Data Wareh. Min., 6, 22, 10.4018/jdwm.2010040102
Sultan, 1996, Computational experience on four algorithms for the hard clustering problem, Pattern Recognit. Lett., 17, 295, 10.1016/0167-8655(95)00122-0
Michalski, 1983, Automated construction of classifications: conceptual clustering versus numerical taxonomy, IEEE Trans. Pattern Anal. Mach. Intell., 5, 396, 10.1109/TPAMI.1983.4767409
Kolodner, 1983, Reconstructive memory: a computer model, Cogn. Sci., 7, 281, 10.1207/s15516709cog0704_2
Carpineto, 1993, An order-theoretic approach to conceptual clustering, 33
Talavera, 2001, Generality-based conceptual clustering with probabilistic concepts, IEEE Trans. Pattern Anal. Mach. Intell., 23, 196, 10.1109/34.908969
Hadzikadic, 1989, Concept formation by incremental conceptual clustering, 831
Biswas, 1998, Iterate: a conceptual clustering algorithm for data mining, IEEE Trans. Syst. Man Cybern. Part C, 28, 219, 10.1109/5326.669556
Thompson, 1991, Concept formation in structured domains
Jonyer, 2001, Graph-based hierarchical conceptual clustering, J. Mach. Learn. Res., 2, 19
Lebowitz, 1987, Experiments with incremental concept formation: UNIMEM, Mach. Learn., 2, 103, 10.1007/BF00114264
Hanson, 1989, Conceptual clustering, categorization and polymorphy, Mach. Learn. J., 3, 343, 10.1007/BF00116838
Vesanto, 2000, Clustering of the self-organizing map, IEEE Trans. Neural Netw., 11, 586, 10.1109/72.846731
Upton, 1985, Spatial data analysis by example, 1
Strehl, 2000, Impact of similarity measures on web-page clustering, 58
Fortier, 1996, Clustering procedures, 493
Gluck, 1985, Information, uncertainty, and the utility of categories, 283
Condorcet, 1785
Marcotorchino, 1979
Corter, 1992, Explaining basic categories: feature predictability and information, Psychol. Bull., 111, 291, 10.1037/0033-2909.111.2.291
Strehl, 2000, Clustering guidance and quality evaluation using relationship-based visualization, 483
Stehman, 1997, Selecting and interpreting measures of thematic classification accuracy, Remote Sens. Environ., 62, 77, 10.1016/S0034-4257(97)00083-7
Rand, 1971, Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc., 66, 846, 10.1080/01621459.1971.10482356
Rijsbergen, 1979
Brendan, 2007, Clustering by passing messages between data points, Science, 315, 972, 10.1126/science.1136800
Fowlkes, 1983, A method for comparing two hierarchical clusterings, J. Am. Stat. Assoc., 78, 553, 10.1080/01621459.1983.10478008
Olson, 2008
Powers, 2007, Evaluation: from precision, recall and F-factor to ROC, informedness, markedness & correlation, J. Mach. Learn. Technol., 2, 37
Jaccard, 1901, Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines, Bull. Soc. Vaud. Sci. Nat., 37, 241
Han, 2011
Grefenstette, 1986, Optimization of control parameters for genetic algorithms, IEEE Trans. Syst. Man Cybern., 16, 122, 10.1109/TSMC.1986.289288
Lin, 2013, Designing Mamdani type fuzzy rule using a collaborative FCM scheme
Eugene, 2001, Chapter 4.5. Combinatorial implications of max-flow min-cut theorem, Chapter 4.6. Linear programming interpretation of max-flow min-cut theorem, 117
Papadimitriou, 1998, Chapter 6.1: the max-flow, min-cut theorem, 120
Fotheringham, 1998, Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis, Environ. Plann., 30, 1905, 10.1068/a301905
Honarkhah, 2010, Stochastic simulation of patterns using distance-based pattern modeling, Math. Geosci., 42, 487, 10.1007/s11004-010-9276-7
Tahmasebi, 2012, Multiple-point geostatistical modeling based on the cross-correlation functions, Comput. Geosci., 16, 779, 10.1007/s10596-012-9287-1
Guha, 1999, Rock: a robust clustering algorithm for categorical attributes
Zhang, 1996, BIRCH: an efficient method for very large databases
Jiang, 2014, epiC: an Extensible and Scalable System for Processing Big Data, 541
Huang, 1997
Hinneburg, 1998, An efficient approach to clustering in large multimedia databases with noise
Berry, 1996
Fennell, 2003, The effectiveness of demographics and psychographic variables for explaining brand and product category use, Quant. Market. Econ., 1, 223, 10.1023/A:1024686630821
Kiang, 2007, The effect of sample size on the extended self-organizing map network: a market segmentation application, Comput. Stat. Data Anal., 51, 5940, 10.1016/j.csda.2006.11.011
Dolnicar, 2003, Using cluster analysis for market segmentation–typical misconceptions, established methodological weaknesses and some recommendations for improvement, J. Market. Res., 11, 5
Wagner, 2005, The number of clusters in market segmentation, 157
Durbin, 1998
Kaplan, 2012, Prisoners of abstraction? The theory and measure of genetic variation, and the very concept of “Race”, Biol. Theory, 7, 401, 10.1007/s13752-012-0048-0
Carrington, 2011, Social network analysis: an introduction, 1
Yippy growing by leaps, bounds, The News-Press. 23 May 2010, Retrieved 24 May 2010.
Dirk, 2002, A concept-oriented approach to support software maintenance and reuse activities
Dias, 2003, Organizing the knowledge used in software maintenance, J. Univ. Comput. Sci., 9, 641
Francesco, 2011, Introduction to recommender systems handbook, 1
www.educationaldatamining.org, 2013.
Baker, 2010, Data mining for education, 7, 112
Siemens, 2012, Learning analytics and educational data mining: towards communication and collaboration, 252
Huth, 2008, Classifications of atmospheric circulation patterns: recent advances and applications, Ann. N.Y. Acad. Sci., 1146, 105, 10.1196/annals.1446.019
Bewley, 2011, Real-time volume estimation of a dragline payload, 1571
Manning, 2009
Nguyen, 2012, Clustering with multi-viewpoint-based similarity measure, IEEE Trans. Knowl. Data Eng., 24, 988, 10.1109/TKDE.2011.86
Bravais, 1846, 9, 255
Pearson, 1896, Mathematical contributions to the theory of evolution, III, regression, heredity, and panmixia, Philos. Trans. R. Soc. Lond. Ser. A, 187, 253, 10.1098/rsta.1896.0007
Sørensen, 1948, A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons, K. Dan. Vidensk. Selsk., 5, 1
Dice, 1945, Measures of the amount of ecologic association between species, Ecology, 26, 297, 10.2307/1932409
Hamilton, 1994
Tsay, 2005
Saxena, 2010, Dimensionality reduction with unsupervised feature selection and applying non-Euclidean norms for classification accuracy, Int. J. Data Warehous. Min., 6, 22, 10.4018/jdwm.2010040102
Arora, 2014, A survey of clustering techniques for big data analysis
Shirkhorshidi, 2014, 8583, 707
Wang, 2002, Clustering by pattern similarity in large data sets
Bharill, 2016, Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark, IEEE Trans. Big Data, 2, 339, 10.1109/TBDATA.2016.2622288
Russom, 2011
Xiao, 2013, Multi-view k-means clustering on big data
Fan, 2013, Mining big data: current status and forecast to the future, ACM SIGKDD Explor. Newsl., 14, 1, 10.1145/2481244.2481246
Shvachko, 2010, The hadoop distributed file system
Jeffrey, 2010, MapReduce: a flexible data processing tool, Commun. ACM, 53, 72, 10.1145/1629175.1629198
Dean, 2010, Map Reduce: a flexible data processing tool, Communications of the ACM, 53, 72, 10.1145/1629175.1629198
Celeux, 1992, A classification EM algorithm for clustering and two stochastic versions, Comput. Stat. Data Anal., 14, 315, 10.1016/0167-9473(92)90042-E
Kaufman, 1990
Ngand, 2002, CLARANS: a method for clustering objects for spatial data mining, IEEE Trans. Knowl. Data Eng., 14, 1003, 10.1109/TKDE.2002.1033770
Sisodia, 2012, Clustering techniques: a brief survey of different clustering algorithms, Int. J. Latest Trends Eng. Technol., 1, 82
Zhong, 2010, A graph-theoretical clustering method based on two rounds of minimum spanning trees, Pattern Recognit., 43, 752, 10.1016/j.patcog.2009.07.010
Condon, 2001, Algorithms for graph partitioning on the planted partition model, Random Struct. Algorithms, 18, 116, 10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2
Donath, 1973, Lower bounds for the partitioning of graphs, IBM J. Res. Dev., 17, 420, 10.1147/rd.175.0420
Shi, 2000, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22, 888, 10.1109/34.868688
Rohe, 2011, Spectral clustering and the high-dimensional stochastic block model, Ann. Stat., 39, 1878, 10.1214/11-AOS887
Gunnemann, 2010, Subspace clustering meets dense sub-graph mining: a synthesis of two paradigms
Macropol, 2010, Scalable discovery of best clusters on large graphs, 3, 693
Whang, 2012, Scalable and memory-efficient clustering of large-scale social networks
Karypis, 1998, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20, 359, 10.1137/S1064827595287997
Karypis, 1998, Multilevel k-way partitioning scheme for irregular graphs, J. Parallel Distrib. Comput., 48, 96, 10.1006/jpdc.1997.1404
Yan, 2009, Fast approximate spectral clustering, 907
Liu, 2013, Large-scale spectral clustering on graphs
Yang, 2015, A divide and conquer framework for distributed graph clustering
Ghosh, 2013, Comparative analysis of k-means and fuzzy c-means algorithms, Int. J. Adv. Comput. Sci. Appl., 4, 35
Niwattanakul, 2013, Using of Jaccard coefficient for keywords similarity, I, 1
C. Chen, L. Pau, and P. Wang, Cluster analysis and related issue, R. Dubes Eds. Handbook of Pattern Recognition and Computer Vision, World Scientific, Singapore, pp. 3–32.
Jain, 1988
Shi, 2013, A link clustering based overlapping community detection algorithm, Data Knowl. Eng., 87, 394, 10.1016/j.datak.2013.05.004
Palla, 2005, Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435, 814, 10.1038/nature03607
Wolpert, 1997, No free lunch theorem for optimization, IEEE Trans. Evol. Comput., 1, 67, 10.1109/4235.585893
Bensmail, 1997, Inference in model-based cluster analysis, Stat. Comput, 7, 1, 10.1023/A:1018510926151