Clustering by Passing Messages Between Data Points

American Association for the Advancement of Science (AAAS) - Tập 315 Số 5814 - Trang 972-976 - 2007
Brendan J. Frey1, Delbert Dueck1
1Department of Electrical and Computer Engineering, University of Toronto, 10 King's College Road, Toronto, Ontario, M5S 3G4, Canada

Tóm tắt

Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.

Từ khóa


Tài liệu tham khảo

J. MacQueen, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, L. Le Cam, J. Neyman, Eds. (Univ. of California Press, Berkeley, CA, 1967), vol. 1, pp. 281–297.

Supporting material is available on Science Online.

Software implementations of affinity propagation along with the data sets and similarities used to obtain the results described in this manuscript are available at www.psi.toronto.edu/affinitypropagation.

10.1038/ng1630

10.1093/nar/gkg111

C. D. Manning H. Schutze Foundations of Statistical Natural Language Processing (MIT Press Cambridge MA 1999).

10.1073/pnas.79.8.2554

10.1006/jcss.2002.1882

10.1109/TIT.2005.850085

10.1109/18.910572

A. P. Dempster, N. M. Laird, D. B. Rubin, Proc. R. Stat. Soc. B39, 1 (1977).

S. Dasgupta, L. J. Schulman, Proc. 16th Conf. UAI (Morgan Kaufman, San Francisco, CA, 2000), pp. 152–159.

10.1198/1061860043001

R. R. Sokal, C. D. Michener, Univ. Kans. Sci. Bull.38, 1409 (1958).

10.1109/34.868688

10.1126/science.1086309

J. Pearl Probabilistic Reasoning in Intelligent Systems (Morgan Kaufman San Mateo CA 1988).

10.1109/18.748992

10.1109/26.539767

10.1126/science.1073287

B. J. Frey, R. Koetter, N. Petrovic, in Proc. 14th Conf. NIPS (MIT Press, Cambridge, MA, 2002), pp. 737–743.

T. Meltzer, C. Yanover, Y. Weiss, in Proc. 10th Conf. ICCV (IEEE Computer Society Press, Los Alamitos, CA, 2005), pp. 428–435.

We thank B. Freeman G. Hinton R. Koetter Y. LeCun S. Roweis and Y. Weiss for helpful discussions and P. Dayan G. Hinton D. MacKay M. Mezard S. Roweis and C. Tomasi for comments on a previous draft of this manuscript. We acknowledge funding from Natural Sciences and Engineering Research Council of Canada Genome Canada/Ontario Genomics Institute and the Canadian Institutes of Health Research. B.J.F. is a Fellow of the Canadian Institute for Advanced Research.