Coreference resolution: A review of general methodologies and applications in the clinical domain

Journal of Biomedical Informatics - Tập 44 - Trang 1113-1122 - 2011
Jiaping Zheng1, Wendy W. Chapman2, Rebecca S. Crowley3, Guergana K. Savova1,4
1Children’s Hospital Boston, 300 Longwood Ave, Boston, MA 02115, United States
2University of California, San Diego, 9500 Gilman Dr., Bldg 2 #0728, La Jolla, CA 92093, United States
3University of Pittsburgh Medical Center, 5150 Centre Ave, Pittsburgh, PA 15232, United States
4Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, United States

Tài liệu tham khảo

Fiszman M, Haug Peter J, Frederick PR. Automatic extraction of PIOPED interpretations from ventilation/perfusion lung scan reports. In: Proc AMIA Symp; 1998. p. 860–4. Xu, 2010, Medex: a medication information extraction system for clinical narratives, J Am Med Inform Assoc, 17, 19, 10.1197/jamia.M3378 Zeng, 2010, Extracting principal diagnosis, co-morbidity, and smoking status for asthma research: evaluation of a natural language processing system, BMC Med Inform Decis Mak, 6, 30, 10.1186/1472-6947-6-30 Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. In: Proceedings of the AMIA annual symposium 2008; 2008. p. 404–8. Liao, 2010, Electronic medical records for discovery research in rheumatoid arthritis, Arthritis Care Res, 62, 1120, 10.1002/acr.20184 Kullo, 2010, Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease, J Am Med Inform Assoc, 17, 568, 10.1136/jamia.2010.004366 Savova GK, Fan J, Ye Z, Murphy SP, Zheng J, Chute CG, et al. Discovering peripheral arterial disease cases from radiology notes using natural language processing. In: AMIA Annu Symp Proc; 2010. p. 722–6. Garla V, Lo Re III V, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, et al. The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc, in press. doi:10.1136/amiajnl-2011-000093. Hirst, 1981, Anaphora in natural language understanding: a survey, vol. 119 Coreference task definition. In: Proceedings of the 6th message understanding conference; 1995. p. 333–44. Hirschman L, Chinchor N. Coreference task definition. In: Proceedings of the 7th message understanding conference; 1997. Mitkov R. Anaphora resolution: the state of the art; 1999. Based on the COLING’98/ACL’98 tutorial on anaphora resolution. Ng V. Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48th annual meeting of the association for computational linguistics; 2010. p. 1396–411. Coden, 2005, Domain-specific language models and lexicons for tagging, J Biomed Inform, 38, 422, 10.1016/j.jbi.2005.02.009 Meystre, 2008, Extracting information from textual documents in the electronic health record: a review of recent research, IMIA Yearbook 2008: Access Health Inform, 1, 128 Friedman, 1994, A general natural language text processor for clinical radiology, J Am Med Inform Assoc, 1, 161, 10.1136/jamia.1994.95236146 Haug P, Koehler S, Lau LM, Wang P, Rocha R, Huff S. A natural language understanding system combining syntactic and semantic techniques. In: Proc Annu Symp Comput Appl Med Care; 1994. p. 247–51. Fiszman, 2000, Automatic detection of acute bacterial pneumonia from chest X-ray reports, J Am Med Inform Assoc, 7, 593, 10.1136/jamia.2000.0070593 Hahn, 2002, medSynDiKATe—a natural language system for the extraction of medical information from findings reports, Int J Med Inform, 67, 63, 10.1016/S1386-5056(02)00053-9 Friedman C. A broad-coverage natural language processing system. In: Proceedings of AMIA symposium; 2000. p. 270–4. Goryachev S, Sordo M, Zeng QT. A suite of natural language processing tools developed for the i2b2 project. In: AMIA Annu Symp Proc; 2006. p. 931. Savova, 2010, Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications, J Am Med Inform Assoc, 17, 507, 10.1136/jamia.2009.001560 Hahn U, Romacker M, Schulz S. medSynDiKATe—design considerations for an ontology-based medical text understanding system. In: Proc AMIA Symp; 2000. p. 330–4. Savova, 2011, Anaphoric relations in the clinical narrative: corpus creation, J Am Med Inform Assoc, 18, 459, 10.1136/amiajnl-2011-000108 Coden, 2009, Automatically extracting cancer disease characteristics from pathology reports into a cancer disease knowledge model, J Biomed Inform, 42, 937, 10.1016/j.jbi.2008.12.005 2000 Roberts, 2009, Building a semantically annotated corpus of clinical text, J Biomed Inform, 42, 950, 10.1016/j.jbi.2008.12.013 Sidner, 1981, Focusing for interpretation of pronouns, Am J Comput Linguist, 7, 217 Rich, 1988, An architecture for anaphora resolution, 18 Hobbs, 1978, Resolving pronoun references, Lingua, 44, 311, 10.1016/0024-3841(78)90006-2 Lappin, 1994, An algorithm for pronominal anaphora resolution, Comput Linguist, 20, 535 Kennedy C, Boguraev B. Anaphora for everyone: pronominal anaphora resolution without a parser. In: Proceedings of the 16th international conference on computational linguistics; 1996. Castaño J, Zhang J, Pustejovsky J. Anaphora resolution in biomedical literature. In: Proceedings of the international symposium on reference resolution for NLP. Alicante, Spain; 2002. McCarthy JF, Lehnert WG. Using decision trees for coreference resolution. In: Proceedings of the fourteenth international joint conference on artificial intelligence (IJCAI’95). Montreal, Quebec; 1995. p. 1050–5. Soon, 2001, A machine learning approach to coreference resolution of noun phrases, Comput Linguist, 27, 521, 10.1162/089120101753342653 Ng V, Cardie C. Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Philadelphia (PA); 2002. p. 104–11. doi:10.3115/1073083.1073102. Ng V, Cardie C. Combining sample selection and error-driven pruning for machine learning of coreference rules. In: Proceedings of the 2002 conference on empirical methods in natural language processing; 2002. p. 55–62. Yang X, Zhou G, Su J, Tan CL. Coreference resolution using competition learning approach. In: Proceedings of the 41st annual meeting of the association for computational linguistics. Association for Computational Linguistics; 2003. p. 176–83, doi:10.3115/1075096.1075119. Harabagiu SM, Bunescu R, Maiorano SJ. Text and knowledge mining for coreference resolution. In: Second meeting of the North American chapter of the association for computational linguistics. 2001. doi:10.3115/1073336.1073344. Uryupina O. Linguistically motivated sample selection for coreference resolution. In: Proceedings of DAARC. Furnas, Portugal; 2004. Luo X, Ittycheriah A, Jing H, Kambhatla N, Roukos S. A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of the 42nd meeting of the association for computational linguistics (ACL’04). Barcelona, Spain; 2004. p. 135–42. doi:10.3115/1218955.1218973. Ng V. Machine learning for coreference resolution: from local classification to global ranking. In: Proceedings of the 43rd annual meeting of the association for computational linguistics; 2005. p. 157–64. Denis P, Baldridge J. Specialized models and ranking for coreference resolution. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu (HI): Association for Computational Linguistics; 2008. p. 660–9. Wagner, 1974, The string-to-string correction problem, J ACM, 21, 168, 10.1145/321796.321811 Denis, 2007, Joint determination of anaphoricity and coreference resolution using integer programming, 236 Yang X, Su J, Tan CL. Kernel-based pronoun resolution with structured syntactic knowledge. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia; 2006. p. 41–8. doi:10.3115/1220175.1220181. Miller, 1995, WordNet: a lexical database for English, Commun ACM, 38, 39, 10.1145/219717.219748 Ng V. Shallow semantics for coreference resolution. In: Proceedings of the 20th international joint conference on artifical intelligence. Hyderabad, India; 2007, p. 1689–94. Bengtson E., Roth D. Understanding the value of features for coreference resolution. In: EMNLP 2008: proceedings of the conference on empirical methods in natural language processing. Honolulu, HI; 2008. p. 294–303. Kehler A. Probabilistic coreference in information extraction. In: Proceedings of the second conference on empirical methods in natural language processing (EMNLP-97); 1997. p. 163–73. Morton TS. Coreference for NLP applications. In: Proceedings of the 38th annual meeting of the association for computational linguistics; 2000. p. 173–80. doi:10.3115/1075218.1075241. Culotta, 2007, First-order probabilistic models for coreference resolution, 81 Uryupina O. Corry: a system for coreference resolution. In: Proceedings of the 5th international workshop on semantic evaluation; 2010. p. 100–3. Ge N, Hale J, Charniak E. A statistical approach to anaphora resolution. In: Proceedings of the sixth workshop on very large corpora; 1998. p. 161–71. McCallum, 2004, Conditional models of identity uncertainty with application to noun coreference, vol. 17, 905 Ng V, Cardie C. Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: Proceedings of the 19th international conference on computational linguistics; vol. 1. Taipei; 2002c. doi:10.3115/1072228.1072367. Yang X, Su J, Zhou G, Tan CL. An NP-cluster based approach to coreference resolution. In: COLING ’04: Proceedings of the 20th international conference on computational linguistics. Geneva (Switzerland); 2004. p. 226–32. doi:10.3115/1220355.1220388. Kim, 2003, GENIA corpus—a semantically annotated corpus for bio-textmining, Bioinformatics, 19, i180, 10.1093/bioinformatics/btg1023 Yang, 2008, An entity-mention model for coreference resolution with inductive logic programming, 843 Nicolae C, Nicolae G. BestCut: a graph algorithm for coreference resolution. In: Proceedings of the 2006 conference on empirical methods in natural language processing. Sydney (Australia): Association for Computational Linguistics; 2006. p. 275–83. Rahman A, Ng V. Supervised models for coreference resolution. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Singapore; 2009. p. 968–77. Finley T, Joachims T. Supervised clustering with support vector machines. In: International conference on machine learning (ICML). 2005. p. 217–24. Bansal, 2004, Correlation clustering, Mach Learn, 56, 89, 10.1023/B:MACH.0000033116.57574.95 Daumé III H, Marcu D. A large-scale exploration of effective global features for a joint entity detection and tracking model. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing. Vancouver (British Columbia, Canada): Association for Computational Linguistics; 2005. p. 97–104. Daumé III H, Marcu D. Learning as search optimization: approximate large margin methods for structured prediction. In: International conference on machine learning (ICML). Bonn, Germany; 2005, p. 169–76. doi:10.1145/1102351.1102373. Cohen WW. Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning; 1995. p. 115–23. Evans, 2001, Applying machine learning toward an automatic classification of It, J Lit Linguist Comput, 16, 45, 10.1093/llc/16.1.45 Bean DL, Riloff E. Corpus-based identification of non-anaphoric noun phrases. In: Proceedings of the 37th annual meeting of the association for computational linguistics. College Park (Maryland, USA): Association for Computational Linguistics; 1999. p. 373–80. doi:10.3115/1034678.1034737. Ng V. Learning noun phrase anaphoricity to improve conference resolution: issues in representation and optimization. In: Proceedings of the 42nd meeting of the association for computational linguistics (ACL’04). Barcelona, Spain; 2004. p. 151–8. doi:10.3115/1218955.1218975. Ariel, 1988, Referring and accessibility, J Linguist, 24, 65, 10.1017/S0022226700011567 Gundel, 1993, Cognitive status and the form of referring expressions in discourse, Language, 69, 274, 10.2307/416535 Strube M, Rapp S, Müller C. The influence of minimum edit distance on reference resolution. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Association for Computational Linguistics; 2002. p. 312–9. doi:10.3115/1118693.1118733. Zelenko D, Aone C, Tibbetts J. Coreference resolution for information extraction. In: ACL 2004: workshop on reference resolution and its applications; 2004. p. 24–31. Bergsma S, Lin D, Goebel R. Distributional identification of non-referential pronouns. In: Proceedings of ACL-08: HLT. Columbus (Ohio): Association for Computational Linguistics; 2008. p. 10–8. Cardie C, Wagstaff K. Noun phrase coreference as clustering. In: Proceedings of the join SIGDAT conference on empirical methods in natural language processing and very large Corpora. 1999. p. 82–99. Haghighi A, Klein D. Unsupervised coreference resolution in a nonparametric bayesian model. In: Proceedings of the 45th annual meeting of the association of computational linguistics. Prague, Czech Republic: Association for Computational Linguistics; 2007. p. 848–55. Ng V. Unsupervised models for coreference resolution. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu, Hawaii; 2008. p. 640–9. Poon H, Domingos P. Joint unsupervised coreference resolution with Markov Logic. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu (HI): Association for Computational Linguistics; 2008. p. 650–9. Richardson, 2006, Markov logic networks, Mach learn, 62, 107, 10.1007/s10994-006-5833-1 Lowd D, Domingos P. Efficient weight learning for markov logic networks. In: Proceedings of the 11th European conference on principles and practices of knowledge discovery in databases (PKDD). 2007. p. 200–11. Haghighi A, Klein D. Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Singapore: Association for Computational Linguistics; 2009. p. 1152–61. Cherry, 2005, An expectation maximization approach to pronoun resolution, 88 Charniak E, Elsner M. EM works for pronoun anaphora resolution. In: Proceedings of the 12th Conference of the European chapter of the ACL (EACL 2009). Athens, Greece: Association for Computational Linguistics; 2009. p. 148–56. Vilain M., Burger J., Aberdeen J., Connolly D., Hirschman L. A model-theoretic coreference scoring scheme. In: MUC6’95: Proceedings of the 6th conference on message understanding. Morristown (NJ, USA): Association for Computational Linguistics; 1995. p. 45–52. doi:10.3115/1072399.1072405. Bagga A, Baldwin B. Algorithms for scoring coreference chains. In: The first international conference on language resources and evaluation workshop on linguistics coreference; 1998. Luo X. On coreference resolution performance metrics. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Vancouver (BC): Association for Computational Linguistics; 2005. p. 25–32. doi:10.3115/1220575.1220579. Popescu-Belis A, Rigouste L, Salmon-Alt S, Romary L. Online evaluation of coreference resolution. In: Proceedings of 4th international conference on language resources and evaluation (LREC 2004). Lisbon, Portugal; 2004. p. 1507–10. Carletta, 1996, Assessing agreement on classification tasks: The kappa statistic, Comput Linguist, 22, 249 Krippendorff, 1970, Estimating the reliability, systematic error and random error of interval data, Educ Psychol Measur, 30, 61, 10.1177/001316447003000105 Poesio, 2005, The reliability of anaphoric annotation, reconsidered: taking ambiguity into account, 76 Yang X, Su J. Coreference resolution using semantic relatedness information from automatically discovered patterns. In: Proceedings of the 45th annual meeting of the association of computational linguistics; 2007. p. 528–35. Huang Z, Zeng G, Xu W, Celikyilmaz A. Accurate semantic class classifier for coreference resolution. In: Proceedings of the 2009 conference on empirical methods in natural language processing; 2009. p. 1232–40. Segura-Bedmar, 2010, Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents, BMC Bioinform, 11, S1, 10.1186/1471-2105-11-S2-S1 Kim, 2004, BioAR: anaphora resolution for relating protein names to proteome database entries, 79 Liang, 2005, Anaphora resolution for biomedical literature by exploiting multiple resources, Vol. 3651, 742 Gasperin, 2006, Semi-supervised anaphora resolution in biomedical texts, 96 Su J, Yang X, Hong H, Tateisi Y, Tsujii J. Coreference resolution in biomedical texts: a machine learning approach. In: Ashburner M, Leser U, Rebholz-Schuhmann D, editors. Ontologies and text mining for life sciences: current status and future perspectives. No. 08131 in Dagstuhl seminar proceedings. Dagstuhl (Germany): Schloss Dagstuhl – Leibniz-Zentrum fuer Informatik, Germany; 2008. Gasperin C, Briscoe T. Statistical anaphora resolution in biomedical texts. In: Proceedings of the 22nd international conference on computational linguistics (Coling 2008). Manchester (UK): Coling 2008 Organizing Committee; 2008. p. 257–64.