A review of clustering techniques and developments

Neurocomputing - Tập 267 - Trang 664-681 - 2017
Amit Saxena1, Mukesh Prasad2, Akshansh Gupta3, Neha Bharill4, Om Prakash Patel4, Aruna Tiwari4, Meng Joo Er5, Weiping Ding6, Chin‐Teng Lin2
1Department of Computer Science & IT, Guru Ghasidas Vishwavidyalaya, Bilaspur, India
2Centre for Artificial Intelligence, University of Technology Sydney, Sydney, Australia
3School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
4Department of Computer Science and Engineering, Indian Institute of Technology Indore, India
5School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
6School of Computer and Technology, Nantong University, Nantong, China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Duda, 2001

Zhang, 2014, Cross-validation based weights and structure determination of Chebyshev-polynomial neural networks for pattern classification, Pattern Recognit., 47, 3414, 10.1016/j.patcog.2014.04.026

Nakayama, 1998, Pattern classification by linear goal programming and its extensions, J. Global Optim., 12, 111, 10.1023/A:1008244409770

C.M. Bishop, Pattern Recognition and Machine Learning, Springer, Berlin. ISBN 978-0-387-31073-2.

Zhang, 2002, Neural networks for classification: a survey, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 30, 451, 10.1109/5326.897072

Zhang, 2011, Data-core-based fuzzy min–max neural network for pattern classification, IEEE Trans. Neural Netw., 22, 2339, 10.1109/TNN.2011.2175748

Jiang, 2003, Constructing and training feed-forward neural net- works for pattern classification, Pattern Recognit., 36, 853, 10.1016/S0031-3203(02)00087-0

Ou, 2007, Multi-class pattern classification using neural networks, Pattern Recognit., 40, 4, 10.1016/j.patcog.2006.04.041

Paola, 1995, A detailed comparison of back propagation neural network and maximum-likelihood classifiers for urban land use classification, IEEE Trans. Geosci. Remote Sens., 33, 981, 10.1109/36.406684

Rumelhart, 1986

Zhou, 1999, Verification of the nonparametric characteristics of back-propagation neural networks for image classification, IEEE Trans. Geosci. Remote Sens., 37, 771, 10.1109/36.752193

Jaeger, 1999, Supervised fuzzy classification of SAR data using multiple sources, IEEE Int. Geosci. Remote Sens. Symp.

Marzano, 2007, Supervised fuzzy-logic classification of hydrometeors using C-band weather radars, IEEE Trans. Geosci. Remote Sens., 45, 3784, 10.1109/TGRS.2007.903399

B. Xue, 2013, Particle swarm optimization for feature selection in classification: a multi-objective approach, IEEE Trans. Cybern., 43, 1656, 10.1109/TSMCB.2012.2227469

Saxena, 2008, Novel approach for the use of small world theory in particle swarm optimization

Pawlak, 1982, Rough sets, Int. J. Comput. Inf. Sci., 11, 341, 10.1007/BF01001956

Pawlak, 1991

Dalai, 2013, Rough-set-based feature selection and classification for power quality sensing device employing correlation techniques, IEEE Sens. J., 13, 563, 10.1109/JSEN.2012.2219144

Quinlan, 1986, Induction of decision trees, Mach. Learn., 1, 81, 10.1007/BF00116251

Farida, 2014, Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks, Expert Syst. Appl., 41, 1937, 10.1016/j.eswa.2013.08.089

Han, 2011

Rokach, 2005, Clustering methods, 331

Saxena, 2010, Evolutionary methods for unsupervised feature selection using Sammon's stress function, Fuzzy Inf. Eng., 2, 229, 10.1007/s12543-010-0047-4

Jain, 2010, Data clustering: 50 years beyond k-means, Pattern Recognit. Lett., 31, 651, 10.1016/j.patrec.2009.09.011

Merriam-Webster Online Dictionary, 2008

Castro, 2000, A fast and robust general purpose clustering algorithm

Fraley, 1998

Jain, 1999, Data clustering: a review, ACM Comput. Surv., 31, 264, 10.1145/331499.331504

Sneath, 1973

King, 1967, Step-wise clustering procedures, J. Am. Stat. Assoc., 69, 86, 10.1080/01621459.1967.10482890

Ward, 1963, Hierarchical grouping to optimize an objective function, J. Am. Stat. Assoc., 58, 236, 10.1080/01621459.1963.10500845

Murtagh, 1984, A survey of recent advances in hierarchical clustering algorithms which use cluster centers, Comput. J., 26, 354, 10.1093/comjnl/26.4.354

Nagpal, 2013, Review based on data clustering algorithms

Periklis, 2002

Guha, 1998

George, 1999, Chameleon: a hierarchical clustering algorithm using dynamic modeling, IEEE Comput., 32, 68, 10.1109/2.781637

Lam, 2014, Clustering, academic press library in signal processing, Signal Process. Theory Mach. Learn., 1, 1115

MacQueen, 1967, Some methods for classification and analysis of multivariate observations, vol. 1, 281

Gersho, 1992

Dunn, 1973, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybern., 3, 32, 10.1080/01969727308546046

Bezdek, 1981

Yager, 1994, Approximate clustering via the mountain method, IEEE Trans. Syst. Man Cybern. Part B Cybern., 24, 1279, 10.1109/21.299710

Gath, 1989, Unsupervised optimal fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell., 11, 773, 10.1109/34.192473

Hathaway, 2000, Generalized fuzzy c-means clustering strategies using Lp norm distances, IEEE Trans. Fuzzy Syst., 8, 576, 10.1109/91.873580

Krishnapuram, 1993, A possibilistic approach to clustering, IEEE Trans. Fuzzy Syst., 1, 98, 10.1109/91.227387

Zahn, 1971, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput., C-20, 68, 10.1109/T-C.1971.223083

Urquhart, 1982, Graph-theoretical clustering based on limited neighborhood sets, Pattern Recognit., 15, 173, 10.1016/0031-3203(82)90069-3

Fisher, 1987, Knowledge acquisition via incremental conceptual clustering, Mach. Learn., 2, 139, 10.1007/BF00114265

Haykin, 1999

Xu, 2005, Survey of clustering algorithms, IEEE Trans. Neural Netw., 16, 645, 10.1109/TNN.2005.845141

Xu., 2010, Clustering algorithms in biomedical research: a review, IEEE Rev. Biomed. Eng., 3, 120, 10.1109/RBME.2010.2083647

McLachlan, 1997

J.D. Banfield and A.E. Raftery, Model-based Gaussian and non-Gaussian clustering Biometrics, vol. 49, no. 3, pp. 803–821, 1993.

Ester, 1996, A density-based algorithm for discovering clusters in large spatial databases with noise

Cheeseman, 1996, Bayesian classification (AutoClass): theory and results, 153

Wallace, 1994, Intrinsic classification by MML-the snob program, 37

Wang, 1997, STING: a statistical information grid approach to spatial data mining, 86

Sheikholeslami, 2000, WaveCluster: a wavelet-based clustering approach for spatial data in very large databases, Int. J. Very Large Data Bases, 8, 289, 10.1007/s007780050009

Agrawal, 1998, Automatic subspace clustering of high dimensional data for data mining applications, 94

Jain, 1999, Data clustering: a review, ACM Comput. Surv., 31, 264, 10.1145/331499.331504

Schwefel, 1981

Fogel, 1965

Holland, 1975

Goldberg, 1989

Kennedy, 2001, Swarm Intelligence

Kennedy, 1995, Particle swarm optimization, 1942

Dorigoand, 2004

Glover, 1986, Future paths for integer programming and links to artificial intelligence, Comput. Oper. Res., 5, 533, 10.1016/0305-0548(86)90048-1

Al. Sultan, 1995, A tabu search approach to clustering problem, Pattern Recognit., 28, 1443, 10.1016/0031-3203(95)00022-R

Pedrycz, 2002, Collaborative fuzzy clustering, Pattern Recognit. Lett., 23, 1675, 10.1016/S0167-8655(02)00130-7

Coletta, 2012, Collaborative fuzzy clustering algorithms: some refinements and design guidelines, IEEE Trans. Fuzzy Syst., 20, 444, 10.1109/TFUZZ.2011.2175400

Pedrycz, 2008, Collaborative clustering with the use of fuzzy c-means and its quantification, Fuzzy Sets Syst., 159, 2399, 10.1016/j.fss.2007.12.030

Pedrycz, 2005

Prasad, 2013, Vertical collaborative fuzzy c-means for multiple EEG data sets, 8102, 246

Pizzuti, 2009, 859

Gregory, 2008, A fast algorithm to find overlapping communities in networks, 408

Ahn, 2010, Link communities reveal multi-scale complexity in networks, Nature, 466, 761, 10.1038/nature09182

Forestier, 2010, Collaborative clustering with back ground knowledge, Data Knowl. Eng., 69, 211, 10.1016/j.datak.2009.10.004

Handl, 2007, An evolutionary approach to multiobjective clustering, IEEE Trans. Evolut. Comput., 11, 56, 10.1109/TEVC.2006.877146

Konak, 2006, Multiobjective optimization using genetic algorithms: a tutorial, Reliab. Eng. Syst. Saf., 91, 992, 10.1016/j.ress.2005.11.018

Faceili, 2006, Multiobjective clustering ensemble

Law, 2004, Multiobjective data clustering, IEEE Conf. Comp. Vis. Pattern Recognit., 2, 424

Forsyth, 2002

Consortium, 2001, Initial sequencing and analysis of the human genome, Nature, 409, 860, 10.1038/35057062

Dorai, 1995, Shape spectra based view grouping for free form object, 3, 240

Connell, 1998, Learning prototypes for on-line handwritten digits, 1, 182

Rasmussen, 1992, 419

McKiernan, 1990

Hedberg, 1996, Searching for the mother lode: tales of the first data miners, IEEE Expert Intell. Syst. Appl., 11, 4

Cohen, 1996

Saxena, 2010, Dimensionality reduction with unsupervised feature selection and applying non-Euclidean norms for classification accuracy, Int. J. Data Wareh. Min., 6, 22, 10.4018/jdwm.2010040102

Sultan, 1996, Computational experience on four algorithms for the hard clustering problem, Pattern Recognit. Lett., 17, 295, 10.1016/0167-8655(95)00122-0

Michalski, 1983, Automated construction of classifications: conceptual clustering versus numerical taxonomy, IEEE Trans. Pattern Anal. Mach. Intell., 5, 396, 10.1109/TPAMI.1983.4767409

Venter, 2001, The sequence of the human genome, Science, 291, 1304, 10.1126/science.1058040

Kolodner, 1983, Reconstructive memory: a computer model, Cogn. Sci., 7, 281, 10.1207/s15516709cog0704_2

Carpineto, 1993, An order-theoretic approach to conceptual clustering, 33

Talavera, 2001, Generality-based conceptual clustering with probabilistic concepts, IEEE Trans. Pattern Anal. Mach. Intell., 23, 196, 10.1109/34.908969

Hadzikadic, 1989, Concept formation by incremental conceptual clustering, 831

Biswas, 1998, Iterate: a conceptual clustering algorithm for data mining, IEEE Trans. Syst. Man Cybern. Part C, 28, 219, 10.1109/5326.669556

Thompson, 1991, Concept formation in structured domains

Jonyer, 2001, Graph-based hierarchical conceptual clustering, J. Mach. Learn. Res., 2, 19

Lebowitz, 1987, Experiments with incremental concept formation: UNIMEM, Mach. Learn., 2, 103, 10.1007/BF00114264

Hanson, 1989, Conceptual clustering, categorization and polymorphy, Mach. Learn. J., 3, 343, 10.1007/BF00116838

Kohonen, 1998, The self-organizing map, Neurocomputing, 21, 1, 10.1016/S0925-2312(98)00030-7

Vesanto, 2000, Clustering of the self-organizing map, IEEE Trans. Neural Netw., 11, 586, 10.1109/72.846731

Upton, 1985, Spatial data analysis by example, 1

Strehl, 2000, Impact of similarity measures on web-page clustering, 58

Fortier, 1996, Clustering procedures, 493

Gluck, 1985, Information, uncertainty, and the utility of categories, 283

Condorcet, 1785

Marcotorchino, 1979

Corter, 1992, Explaining basic categories: feature predictability and information, Psychol. Bull., 111, 291, 10.1037/0033-2909.111.2.291

Strehl, 2000, Clustering guidance and quality evaluation using relationship-based visualization, 483

Stehman, 1997, Selecting and interpreting measures of thematic classification accuracy, Remote Sens. Environ., 62, 77, 10.1016/S0034-4257(97)00083-7

Rand, 1971, Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc., 66, 846, 10.1080/01621459.1971.10482356

Rijsbergen, 1979

Brendan, 2007, Clustering by passing messages between data points, Science, 315, 972, 10.1126/science.1136800

Fowlkes, 1983, A method for comparing two hierarchical clusterings, J. Am. Stat. Assoc., 78, 553, 10.1080/01621459.1983.10478008

Olson, 2008

Powers, 2007, Evaluation: from precision, recall and F-factor to ROC, informedness, markedness & correlation, J. Mach. Learn. Technol., 2, 37

Jaccard, 1901, Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines, Bull. Soc. Vaud. Sci. Nat., 37, 241

Han, 2011

Grefenstette, 1986, Optimization of control parameters for genetic algorithms, IEEE Trans. Syst. Man Cybern., 16, 122, 10.1109/TSMC.1986.289288

Lin, 2013, Designing Mamdani type fuzzy rule using a collaborative FCM scheme

Eugene, 2001, Chapter 4.5. Combinatorial implications of max-flow min-cut theorem, Chapter 4.6. Linear programming interpretation of max-flow min-cut theorem, 117

Papadimitriou, 1998, Chapter 6.1: the max-flow, min-cut theorem, 120

Fotheringham, 1998, Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis, Environ. Plann., 30, 1905, 10.1068/a301905

Honarkhah, 2010, Stochastic simulation of patterns using distance-based pattern modeling, Math. Geosci., 42, 487, 10.1007/s11004-010-9276-7

Tahmasebi, 2012, Multiple-point geostatistical modeling based on the cross-correlation functions, Comput. Geosci., 16, 779, 10.1007/s10596-012-9287-1

Guha, 1999, Rock: a robust clustering algorithm for categorical attributes

Zhang, 1996, BIRCH: an efficient method for very large databases

Jiang, 2014, epiC: an Extensible and Scalable System for Processing Big Data, 541

Huang, 1997

Hinneburg, 1998, An efficient approach to clustering in large multimedia databases with noise

Berry, 1996

Fennell, 2003, The effectiveness of demographics and psychographic variables for explaining brand and product category use, Quant. Market. Econ., 1, 223, 10.1023/A:1024686630821

Kiang, 2007, The effect of sample size on the extended self-organizing map network: a market segmentation application, Comput. Stat. Data Anal., 51, 5940, 10.1016/j.csda.2006.11.011

Dolnicar, 2003, Using cluster analysis for market segmentation–typical misconceptions, established methodological weaknesses and some recommendations for improvement, J. Market. Res., 11, 5

Wagner, 2005, The number of clusters in market segmentation, 157

Durbin, 1998

Kaplan, 2012, Prisoners of abstraction? The theory and measure of genetic variation, and the very concept of “Race”, Biol. Theory, 7, 401, 10.1007/s13752-012-0048-0

Carrington, 2011, Social network analysis: an introduction, 1

Yippy growing by leaps, bounds, The News-Press. 23 May 2010, Retrieved 24 May 2010.

Dirk, 2002, A concept-oriented approach to support software maintenance and reuse activities

Dias, 2003, Organizing the knowledge used in software maintenance, J. Univ. Comput. Sci., 9, 641

Francesco, 2011, Introduction to recommender systems handbook, 1

www.educationaldatamining.org, 2013.

Baker, 2010, Data mining for education, 7, 112

Siemens, 2012, Learning analytics and educational data mining: towards communication and collaboration, 252

Huth, 2008, Classifications of atmospheric circulation patterns: recent advances and applications, Ann. N.Y. Acad. Sci., 1146, 105, 10.1196/annals.1446.019

Bewley, 2011, Real-time volume estimation of a dragline payload, 1571

Manning, 2009

Nguyen, 2012, Clustering with multi-viewpoint-based similarity measure, IEEE Trans. Knowl. Data Eng., 24, 988, 10.1109/TKDE.2011.86

Bravais, 1846, 9, 255

Pearson, 1896, Mathematical contributions to the theory of evolution, III, regression, heredity, and panmixia, Philos. Trans. R. Soc. Lond. Ser. A, 187, 253, 10.1098/rsta.1896.0007

Sørensen, 1948, A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons, K. Dan. Vidensk. Selsk., 5, 1

Dice, 1945, Measures of the amount of ecologic association between species, Ecology, 26, 297, 10.2307/1932409

Hamilton, 1994

Tsay, 2005

Saxena, 2010, Dimensionality reduction with unsupervised feature selection and applying non-Euclidean norms for classification accuracy, Int. J. Data Warehous. Min., 6, 22, 10.4018/jdwm.2010040102

Arora, 2014, A survey of clustering techniques for big data analysis

Shirkhorshidi, 2014, 8583, 707

Wang, 2002, Clustering by pattern similarity in large data sets

Bharill, 2016, Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark, IEEE Trans. Big Data, 2, 339, 10.1109/TBDATA.2016.2622288

Wu, 2014, Data mining with big data, IEEE Trans. Knowl. Data Eng., 26, 97, 10.1109/TKDE.2013.109

Russom, 2011

Xiao, 2013, Multi-view k-means clustering on big data

Fan, 2013, Mining big data: current status and forecast to the future, ACM SIGKDD Explor. Newsl., 14, 1, 10.1145/2481244.2481246

Shvachko, 2010, The hadoop distributed file system

Jeffrey, 2010, MapReduce: a flexible data processing tool, Commun. ACM, 53, 72, 10.1145/1629175.1629198

Dean, 2010, Map Reduce: a flexible data processing tool, Communications of the ACM, 53, 72, 10.1145/1629175.1629198

Celeux, 1992, A classification EM algorithm for clustering and two stochastic versions, Comput. Stat. Data Anal., 14, 315, 10.1016/0167-9473(92)90042-E

Kaufman, 1990

Ngand, 2002, CLARANS: a method for clustering objects for spatial data mining, IEEE Trans. Knowl. Data Eng., 14, 1003, 10.1109/TKDE.2002.1033770

Sisodia, 2012, Clustering techniques: a brief survey of different clustering algorithms, Int. J. Latest Trends Eng. Technol., 1, 82

Zhong, 2010, A graph-theoretical clustering method based on two rounds of minimum spanning trees, Pattern Recognit., 43, 752, 10.1016/j.patcog.2009.07.010

Chen, 2014, Improved graph clustering, IEEE Trans. Inf. Theory, 60, 6440, 10.1109/TIT.2014.2346205

Condon, 2001, Algorithms for graph partitioning on the planted partition model, Random Struct. Algorithms, 18, 116, 10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2

Donath, 1973, Lower bounds for the partitioning of graphs, IBM J. Res. Dev., 17, 420, 10.1147/rd.175.0420

Shi, 2000, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22, 888, 10.1109/34.868688

Luxburg, 2007, A tutorial on spectral clustering, Stat. Comput., 17, 395, 10.1007/s11222-007-9033-z

Rohe, 2011, Spectral clustering and the high-dimensional stochastic block model, Ann. Stat., 39, 1878, 10.1214/11-AOS887

Gunnemann, 2010, Subspace clustering meets dense sub-graph mining: a synthesis of two paradigms

Macropol, 2010, Scalable discovery of best clusters on large graphs, 3, 693

Whang, 2012, Scalable and memory-efficient clustering of large-scale social networks

Karypis, 1998, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20, 359, 10.1137/S1064827595287997

Karypis, 1998, Multilevel k-way partitioning scheme for irregular graphs, J. Parallel Distrib. Comput., 48, 96, 10.1006/jpdc.1997.1404

Yan, 2009, Fast approximate spectral clustering, 907

Liu, 2013, Large-scale spectral clustering on graphs

Yang, 2015, A divide and conquer framework for distributed graph clustering

Ghosh, 2013, Comparative analysis of k-means and fuzzy c-means algorithms, Int. J. Adv. Comput. Sci. Appl., 4, 35

Niwattanakul, 2013, Using of Jaccard coefficient for keywords similarity, I, 1

C. Chen, L. Pau, and P. Wang, Cluster analysis and related issue, R. Dubes Eds. Handbook of Pattern Recognition and Computer Vision, World Scientific, Singapore, pp. 3–32.

Jain, 1988

Shi, 2013, A link clustering based overlapping community detection algorithm, Data Knowl. Eng., 87, 394, 10.1016/j.datak.2013.05.004

Palla, 2005, Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435, 814, 10.1038/nature03607

Wolpert, 1997, No free lunch theorem for optimization, IEEE Trans. Evol. Comput., 1, 67, 10.1109/4235.585893

Bensmail, 1997, Inference in model-based cluster analysis, Stat. Comput, 7, 1, 10.1023/A:1018510926151

Xu, 2015, A comprehensive survey of clustering algorithms, Ann. Data Sci, 2, 165, 10.1007/s40745-015-0040-1