A Network Analysis Model for Disambiguation of Names in Lists

Springer Science and Business Media LLC - Tập 11 Số 2 - Trang 119-139 - 2005
Bradley Malin1, Edoardo M. Airoldi1, Kathleen M. Carley2
1Data Privacy Laboratory, Institute for Software Research International, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
2Center for the Computational Analysis of Social and Organizational Systems, Institute for Software Research International, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Adamic, L. and E. Adar (2003), “Friends and Neighbors on the Web,” Social Networks, 25(3), 211–230.

Airoldi, E., A. Slavkovic, S. Fienberg (2005), “Interactive Tetrahedron Applet: A Tool for Exploring the Geometry of 2 × 2 Contingency Tables,” Department of Statistics Technical Report CMU-STAT-05-824, Carnegie Mellon University: Pittsburgh, PA.

Airoldi, E. and B. Malin (2004), “Data Mining Challenges for Electronic Safety: The Case of Fraudulent Intent Detection in E-mails,” in Proceedings of the IEEE Workshop on Privacy and Security Aspects of Data Mining, Brighton, England, pp. 57–66.

Albert, R. and A.L. Barabási (2002), “Statistical Mechanics of Complex Networks,” Reviews of Modern Physics, 74, 47–97.

Bagga, A. and B. Baldwin (1998), Entity-based Cross-Document Coreferencing Using the Vector Space Model,” in Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, San Francisco, CA, pp. 79–85.

Banko, M. and E. Brill (2001), “Scaling to Very Large Corpora for Natural Language Disambiguation,” in Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 26–33.

Barabási, A.L. and R. Albert (1999), “Emergence of Scaling in Random Networks,” Science, 286, 509–512.

Bekkerman, R. and A. McCallum (2005), “Disambiguating Web Appearances of People in a Social Network,” in Proceedings of the 2005 World Wide Web Conference, Chiba, Japan.

Bhattacharya, I. and L. Getoor (2004a), “Iterative Record Linkage for Cleaning and Integration,” in Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Paris, France, pp. 11–18.

Bhattacharya, I. and L. Getoor (2004b), “Deduplication and Group Detection Using Links,” in Proceedings of the 2004 ACM SIGKDD Workshop on Link Analysis and Group Detection, Seattle, WA.

Bishop, Y., S. Fienberg and P. Holland (1975), Discrete Multivariate Analysis: Theory and Practice, The MIT Press, Cambridge, MA.

Brill, E. and P. Resnick (1994), “A Rule-based Approach to Prepositional Phrase Attachment Disambiguation,” in Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 1198–1204.

Brown, P., S. Della Pietra, V. Della Pietra and R. Mercer (1991), “Word-sense Disambiguation using Statistical Methods,” in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, pp. 264–270.

Chan, S. and J. Franklin (1998), “Symbolic Connectionism in Natural Language Disambiguation,” IEEE Transactions on Neural Networks, 9(5), 739–755.

Chao, G. and M.G. Dyer (2000), “Word Sense Disambiguation of Adjectives using Probabilistic Networks,” in Proceedings of the 17th International Conference on Computational Linguistics, Saarbrucken, Germany, pp. 152–158.

Coffman, T., S. Greenblatt and S. Marcus (2004), “Graph-Based Technologies for Intelligence Analysis,” Communications of the ACM, 47(3), 45–47.

Cohen, W., P. Ravikumar and S. Fienberg (2003), “A Comparison of String Matching Tasks for Names and Addresses,” in Proceedings of the IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico.

Culotta, A., R. Bekkerman and A. McCallum (2004), “Extracting Social Networks and Contact Information from Email and the Web,” in Proceedings of the First Conference on Email and Anti-Spam, Mountain View, CA.

Diesner, J., and K. Carley (2005), “Exploration of Communication Networks from the Enron Email Corpus,” in Proceedings of the 2005 SIAM Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, CA, pp 3-14.

Duda, R.O., P.E. Hart and D.G. Stork (2001), Pattern Classification, 2nd Edition, Wiley, New York, NY.

Fienberg, S. (1970), “An Iterative Procedure for Estimation in Contingency Tables,” Annals of Mathematical Statistics, 41(3), 907–917.

Gale, W.A., K.W. Church and D. Yarowsky (1992), “A Method for Disambiguating Word Senses in Large Corpora,” Computers and Humanities, 26, 415–439.

Ginter, F., J. Boberg, J. Jarvinen and T. Salakoski (2004), “New Techniques for Disambiguating in Natural Language and Their Application to Biological Text,” Journal of Machine Learning Research, 5, 605–621.

Girvan, M. and M. Newman (2002), “Community Structure in Social and Biological Networks,” in Proceedings of the National Academy of Sciences, USA, 99, 7821–7826.

Hatzivassiloglou, V., P.A. Duboue and A. Rzhetsky (2001), “Disambiguating Proteins, Genes, and RNA in text: A Machine Learning Approach,” Bioinformatics, 17, 97–106.

Internet Movie Database. http://www.imdb.com. Accessed June 20, 2004.

Harada, M., S. Sato and K. Kazama (2004), “Finding Authoritative People on the Web,” in Proceedings of the Joint Conference on Digital Libraries, Tucson, AZ.

Hiro, K, H. Wu and T. Furugori (1996), “Word-Sense Disambiguation with a Corpus-Based Semantic Network,” Journal of Quantitative Linguistics, 3, 244–251.

Jaro, M. (1989) “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” Journal of the American Statistical Association, 89, 414–420.

Jensen, K. and J.L. Binot (1987), “Disambiguating Prepositional Phrase Attachments by Using Online Definitions,” Computational Linguistics, 13(3/4), 251–260.

Jensen, D. and J. Neville (2000), “Iterative Classification in Relational Data,” in Proceedings of the AAAI-2000 Workshop on Learning Statistical Models From Relational Data, pp. 13–20.

Kalashnikov, D., S. Mehotra and Z. Chen (2005), “Exploiting Relationships for Domain-independent Data Cleaning,” in Proceedings of the 2005 SIAM International Conference on Data Mining, Newport Beach, CA, pp. 262–273.

Klimt, B. and Y. Yang (2004), “The Enron Email Corpus: A New Dataset for Email Classification Research,” in Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, pp. 217–226.

Larsen, B. and C. Aone (1999), “Fast and Effective Text Mining Using Linear-time Document Clustering,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, pp. 16–22.

Lesk, M. (1986), “Automatic Sense Disambiguation: How to Tell a Pine Cone from an Ice Cream Cone,” in Proceedings of the 1986 ACM SIGDOC Conference, New York, NY, pp. 24–26.

Malin, B. (2005), “Unsupervised Name Disambiguation via Social Network Similarity,” in Proceedings of the 2005 SIAM Workshop on Link Analysis, Counterterrorism, and Security, Newport Beach, CA, pp. 93–102.

Mann, G. and D. Yarowsky (2003), “Unsupervised Personal Name Disambiguation,” in Proceedings of the 7th Conference on Computational Natural Language Learning, Edmonton, Canada, pp. 33–40.

Neville, J., M. Adler and D. Jensen (2003), “Clustering Relational Data using Attribute and Link Information,” in Proceedings of the IJCAI Text Mining and Link Analysis Workshop, Acapulco, Mexico.

Newman, M. (2003), “The Structure and Function of Complex Networks,” SIAM Review, 45, 167–256.

Ng, H.T. (1997), “Exemplar-Based Word Sense Disambiguation: Some Recent Improvements,” in Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Somerset, New Jersey, pp. 208–213.

Shetty, J. and J. Adibi (2004), “Enron Email Dataset: Database Schema and Brief Statistical Report,” Information Sciences Institute Technical Report, University of Southern California, 2004.

Sweeney, L. (2004), “Finding Lists of People on the Web,” ACM Computers and Society, 34(1).

Thompson, P. (2005), “Text Mining, Names, and Security,” Journal of Database Management, 16(1), 54–59.

Vronis, J. and N. Ide (1999), “Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries,” in Proceedings of the 13th International Conference on Computational Linguistics, Helsinki, Finland, pp. 389–394.

Wacholder, N., Y. Ravin and M. Coi (1997), “Disambiguation of Proper Names in Text,” in Proceedings of the 5th Applied Natural Language Processing Conference, Washington, DC, pp. 202–208.

Wei, J. (2004), “Markov Edit Distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 311–321.

Winkler, W. (1995), “Matching and Record Linkage,” in Cox, B. et al. (ed.), in Business Survey Methods, Wiley, New York, NY, pp. 355–384.

Yarowsky, D. (1992), “Word-sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora,” in Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, Nantes, France, pp. 454–460.

Zelnik-Manor, L. and P. Perona (2004), “Self-Tuning Spectral Clustering,” in Advances in Neural Information Processing Systems 17, Vancouver, Canada, pp. 1601–1608.