A new approach to isotonic agglomerative hierarchical clustering

Journal of Classification - Tập 8 - Trang 217-237 - 1991
Werner Vach1,2, Paul O. Degens1,3
1University of Dortmund, Dortmund, Germany
2Institut für medizinische Biometrie und medizinische Informatik, Albert-Ludwigs-Universität, Freiburg, FRG
3Medizinisches Institut für Umwelthygenie, Düsseldorf 1, FRG

Tóm tắt

Hierarchical clustering methods must be isotonic for the construction of ultrametric. We present a general strategy to widen the class of isotonic methods implemented by agglomerative algorithms. At each step of the agglomeration we allow one of several admissible pairs to be chosen. Then under mild assumptions an appropriate definition of admissibility guarantees isotony. Moreover we consider the use of the new methods to compute locally optimal ultrametrics. Two examples demonstrate the ability to define new agglomerative methods superior to their traditional competitors.

Tài liệu tham khảo

BAKER, F. B. (1974), “Stability of Two Hierarchical Grouping Techniques, Case 1: Sensitivity to Data Errors,”Journal of the American Statistical Association, 69, 440–445. BARLOW, R. E., BARTHOLOMEW, D.J., BREMNER, J. M., and BRUNK, H. D. (1972), Statistical Inference under Order Restrictions, New York: Wiley. BATAGELJ, V. (1981), “Note on Ultrametric Hierarchical Clustering Algorithms,”Psychometrika 46, 351–353. BOCK, H. H. (1974),Automatische Klassifikation, Göttingen: Vandenhoeck and Ruprecht. BRUYNOOGHE, M. (1978), “Classification ascendante hiérarchique des grands ensembles des données: un algorithme rapide fondé sur la construction des voisinages réductibles,”Les Cahiers de l'Analyse des Donées, III, 7–33. DAY, W. H. E., and EDELSBRUNNER, H. (1985), “Investigation of Proportional Link Linkage Clustering Methods,”Journal of Classification, 2, 239–254. D'ANDRADE, R. G. (1978), “U-Statistic Hierarchical Clustering,”Psychometrika, 43, 59–67. DEGENS, P. O. (1983), “Hierarchical Clustering Methods as Maximum Likelihood Estimators,” inNumerical Taxonomy, Ed., J. Felsentein, Berlin: Springer, 249–253. DEGENS, P. O. (1985), “Ultrametric Approximation to Distances,”Computational Statistics Quarterly, 2, 93–101. DEGENS, P. O. (1988), “Reconstruction of Phylogenies by Weighted Genetic Distances,” inClassification and Related Methods of Data Analysis, Ed., H. H. Bock, Proceedings of the First Conference of the IFCS, Amsterdam: North Holland, 727–739. DEGENS, P. O., LAUSEN, B., and VACH, W. (1990), “Reconstruction of Phylogenies by Distance Data: Mathematical Framework and Statistical Analysis,” inTrees and Hierarchical Structures, Eds., A. Dress and A. v. Haeseler, Lecture Notes in Biomathematics, 84, Springer, 9–24. GLASBEY, C. A. (1980), “A Synthesis of Single Linkage and Complete Linkage Clustering Criteria,” inCOMPSTAT 1980, Eds., M. M. Barritt and D. Wishart, Vienna: Physica, 399–404. HARTIGAN, J. A. (1967), “Representation of Similarity Matrices by Trees,”Journal of the American Statistical Association, 62, 1140–1158. HUBERT, L. (1972), “Some Extensions of Johnson's Hierarchical Clustering Algorithms,”Psychometrika, 37, 261–274. HUBERT, L. (1973), “Monotone Invariant Clustering Procedures,”Psychometrika, 38, 47–72. JARDINE, N., and SIBSON, R. (1971),Mathematical Taxonomy, New York: Wiley. JOHNSON, S. C. (1967), “Hierarchical Clustering Schemes,”Psychometrika, 32, 241–254. KRIVANEK, M. (1986), “On the Computational Complexity of Clustering,” inData Analysis and Informatics IV, Eds., E. Diday, et al., Amsterdam: North Holland, 89–96. LANCE, G. N., and WILLIAMS, W. T. (1966), “A Generalized Sorting Strategy for Computer Classifications,”Nature 212, 218. LANCE, G. N. and WILLIAMS, W. T. (1967), “A General Theory of Classification Sorting Strategies. 1. Hierarchical Systems,”Computer Journal, 9, 373–380. LAUSEN, B. (1989), “Maximum Likelihood Agglomeration in Hierarchical Cluster Analysis,” inKlassifikation und Ordnung, Studien zur Klassifikation 19, Ed., R. Wille, Frankfurt: INDEKS, 355–360. MARGUSH, T., and MCMORRIS, F. (1981), “Consensus n-trees,”Bulletin of Mathematical Biology, 43, 239–244. MILLIGAN, G. W. (1979), “Ultrametric Hierarchical Clustering Algorithms,”Psychometrika, 44, 343–346. MURTAGH, F. (1984), “Complexity of Hierarchical Clustering Algorithms: State of the Art,”Computational Statistics Quartery, 1, 101–113. OSTERMANN, R., and DEGENS, P. O. (1984), “Eigenschaften des Average-Linkage-Verfahrens anhand einer Monte-Carlo-Studie,” inAnwendungen der Klassifikation: Datenanalyse und Numerische Klassifikation, Studien zur Klassifikation 15, Ed., H. H. Bock, Frankfurt: INDEKS, 108–114. SIBLEY, C. G., and AHLQUIST, J. E. (1983), “Phylogeny and Classification of Birds Based on the Data of DNA-DNA-Hybridization,”Current Ornithology, 1, 245–292. VACH, W., and DEGENS, P. O. (1986), “Starting More Robust Estimation of Ultrametrics,” inClassification and Its Environment, Studien zur Klassifikation 17, Eds., P. O. Degens, H.-J. Hermes and O. Opitz, Frankfurt: INDEKS, 239–246. VACH, W., and DEGENS, P. O. (1988a), “The System of Common Lower Neighbors of a Hierarchy,” inClassification and Related Methods of Data Analysis, Proceedings of the First Conference of the IFCS, Ed., H. H. Bock, Amsterdam: North Holland, 165–172. VACH, W., and DEGENS, P. O. (1988b), “Improvement of the Average Linkage Method,” Research Report, Department of Statistics, University of Dortmund. VACH, W., and DEGENS, P. O. (1989), “A New Average Linkage Method,” inKlassifikation und Ordnung, Studien zur Klassifikation 19, Ed., R. Wille, Frankfurt: INDEKS, 375–380. WARD, J. H. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.