Characterization and evaluation of similarity measures for pairs of clusterings
Tóm tắt
Từ khóa
Tài liệu tham khảo
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data-mining applications
Arabie P, Boorman SS (1973) Multidimensional scaling of measures of distance between partitions. Math Psychol 10: 148–203
Berkhin P (2002) Survey of clustering data mining techniques. Technical report, Accrue Software
Braun-Blanquet JNY (1932) Plant sociology: the study of plant communities. McGraw-Hill Book Company, Inc, New York
Cheeseman P, Stutz J (1996) Bayesian classification (autoclass): theory and results. In: Fayyad UN, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT press, Cambridge, pp 153–180
Coombs CH, Dawes RM, Tversky A (1970) Mathematical psychology: an elementary introduction. Prentice-Hall, Englewood Cliffs, NJ
Dennis RLH, Williams WR, Shreeve TG (1998) Faunal structures among european butterflies: evolutionary implications of bias for geography, endemism and taxonomic affiliation. Ecography 21: 181–203
Dice LE (1945) Measures of the amount of ecologic association between species. Ecology 26(3): 297–302
Fager EW, McGowan JA (1963) Zooplankton species groups in the north pacific:co-occurrences of species can be used to derive groups whose members react similarly to water-mass types. Science 140: 453–460 doi: 10.1126/science.140.3566.453
Filkov V, Skiena S (2004) Heterogeneous data integration with the consensus clustering formalism. Data Integration in the Life Sciences (DILS). Int Workshop No 1 2994: 110–123
Forbes S (1925) Method of determining and measuring the associative relations of species. Science 61(1585): 518–524
Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. Am Stat Assoc 78(383): 553–569
Fred A, Jain A (2003) Robust data clustering. In: IEEE computer society conference on computer vision and pattern recognition
Halkidi M, Batistikis Y, Vazirgiannis M (2001) On clustering validation techniques. Intell Inf Syst 17: 107–145
Hamann U (1961) Merkmalbestand und verwandtschaftsbeziehungen de farinosae: Ein beitrag zum system der monokotyledonen. Wildenowia 2: 639–768
Hayek LC (1994) Analysis of amphibian biodiversity data. In: Heyer WR, Donnelly MA, McDiarmid RW, Hayek L-AC, Foster MS (eds) Measuring and monitoring biological diversity: standard methods for amphibians. Smithsonian Institution Press
Hinneburg A, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5(4): 387–415
Holliday JD, Hu C-Y, Willett P (2002) Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2d fragment bit-strings. Comb Chem High Throughput Screen 5(2): 155–166
Jaccard P (1901) Distribution de la florine alpine dans la bassin de dranses. et dans quelques regiones voisines. Naturelles Bulletin de la Societe Vaudoise des Sciences, pp 241–272
Karypis G, Han E-H, Kumar V (1999) Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput 32(8): 68–75
Knobbe AJ, Adrianns PW (1996) Analysis of binary association. In: Knowledge Discovery and Data Mining (KDD-96). Portland, Oregon, pp 311–314
Kulczynski S (1927) Zespoly roslin w pieninach—die pflanzenassoziationen der pieninen. Bulletin international de l’acadmie polonaise des sciences et des lettres B(2): 57–203
Kvalseth TO (1987) Entropy and correlation: some comments. IEEE Trans Syst Man Cybern SMC-17: 517–519
Lee TT (1987) An information theoretic analysis of relational databases - part 1: data dependencies and information metric. IEEE Trans Softw Eng SE-13(10): 1049–1061
Lopez de Mantaras R (1989) Id3 revisited: a distance-based criterion for attribute selection. In: International symposium on methodologies for intelligent systems (ISMIS-89). Charlotte, North California
MacQueen J (1967) Some methods for classification and analysis of multivariate observations
Malvestuto FM (1986) Statistical treatment of the information content of a database. Inf Syst 11(3): 211–223
Manning CD, Schutze H (1999) Foundations of statistical natural language processing. MIT Press, New York
McConnaughey BH (1964) The determination and analysis of plankton communities. Marine Research Indonesia Special (Penelitian Laut Di Indonesia) Spec. no. 30
Meila M (2003) Comparing clusterings by variation of information. Proceedings of the 16th annual conference of computational learning theory (COLT)
Michael EL (1920) Marine ecology and the coefficient of association: A plea in behalf of quantitative biology. J Ecol 8(1): 54–59
Mirkin B (2001) Eleven ways to look at the chi-squared coefficient for contingency tables. Am Stat 55(6): 111–120
Mountford MD (1962) An index of similarity and its application to classificatory problems. In: Murphy PW (ed) Progress in soil zoology. Butterworth, London, pp 43–50
Pawlak Z, Wong SK, Ziarko WIJM-M (1988) Rough sets: probabilistic versus deterministic approach. Int J Man Mach Stud 29(1): 81–95
Powers DMW (2007) Expected information in the transmission of an equality selection of distribution/clustering or of individual class labels, echnical report, Flinders University (S.A.)
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C: the art of scientific computing. Cambridge University Press, Cambridge
Quinlan JR (1990) Induction of decision trees. In: Shavlik JW, Dietterich TG (eds) Readings in machine learning, Morgan Kaufmann. Originally published in machine learning 1:81–106, 1986.
Rand WM (1971) Objective criteria for evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850
Rogers DJ, Tanimoto TT (1960) A computer program for classifying plants. Science 132(3434): 1115–1118
Russell PF, Rao TR (1940) On habitat and association of species of anopheline larvae in southeastern, madras. Malaria Inst India 3: 153–178
Savage RM (1934) The breeding behavior of the common frog, rana remporaria linn., and of the common toad bufo bufo bufo linn. Zoological Society of London, pp 55–70
Sneath PHA, Sokal RR (1973) Numerical taxonomy. Freeman and Company, San Francisco
Sorgenfrei T (1958) Molluscan assemblages from the marine middle miocene of south jutland and their environments
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining partitionings. Mach Learn Res 3: 583–617
Tarwid K (1960) Szacowanie zbieznosci nisz ekologicznych gatunkow droga oceny prawdopodobienstwa spotykania sie ich w polowach. Ecol Polska B(6): 115–130
Theodoridis S, Koutroubas K (1999) Pattern recognition. Academic Pres, New York
Thurstone L (1927) A law of comparative judgement. Psychol Rev 34: 278–286
Wallace D.L. (1983) A method for comparing two hierarchical clusterings: comment. Am Stat Assoc 78(383): 569–576
Wan SJ, Wong SKM (1989) A measure for concept dissimilarity and its applications in machine learning. In: International conference on computing and information. Toronto North, Canada, pp 23–27
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Amsterdam
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Yao YY, Wong SKM, Butz CJ (1999) On information theoretic measures of attribute importance. In: Zhong N (ed) PAKDD’99. Beijing, China, pp 133–137
Yule GU (1912) On the methods of measuring association between two attributes. R Soc Lond 75(6): 579–642