Application of text-mining for updating protein post-translational modification annotation in UniProtKB

BMC Bioinformatics - Tập 14 - Trang 1-9 - 2013

Anne-Lise Veuthey¹, Alan Bridge¹, Julien Gobeill², Patrick Ruch², Johanna R McEntyre³, Lydie Bougueleret¹, Ioannis Xenarios^1,4,5

¹Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva 4, Switzerland

²BiTeM Group, Information Science Department, University of Applied Sciences, Carouge, Switzerland

³EMBL-European Bioinformatics Institute, Hinxton, UK

⁴Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, Lausanne, Switzerland

⁵University of Lausanne, Lausanne, Switzerland

Tóm tắt

The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/ .

Tài liệu tham khảo

UniProt C: Reorganizing the protein space at the universal protein resource (UniProt). Nucleic Acids Res. 2012, 40 (Database issue): D71-D75. Hirschman L, Burns GA, Krallinger M, Arighi C, Cohen B, Valencia A, Wu CH, Chatr-Aryamontri A, Dowell KG, Huala E: Text mining for the BioCuration workflow. Database. 2012, 2012: bas020-10.1093/database/bas020. Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol. 2008, 9 (Suppl 2): S4-10.1186/gb-2008-9-s2-s4. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J: Extracting Bio-molecular events from literature - the BioNLP'09 shared task. Comput Intell. 2011, 27: 513-540. 10.1111/j.1467-8640.2011.00398.x. Ohta T, Pyysalo S, Tsujii J: Overview of the epigenetics and post-translational modifications (EPI) task of BioNLP shared task 2011. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task: 24 June 2011. 2011, Portland: Association for Computational Linguistics, 16-25. Buyko E, Faessler E, Wermter J, Hahn U: Event extraction from trimmed dependency graphs. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task: 24 June 2009. 2009, Boulder: Association for Computational Linguistics, 19-27. Bjorne J, Salakoski T: Generalizing biomedical event extraction. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task: 24 June 2011. 2011, Portland: Association for Computational Linguistics, 183-190. Pyysalo S, Ohta T, Miwa M, Tsujii J: Towards exhaustive protein modification event extraction. Proceedings of the 2011 Workshop on Biomedical Natural Language Processing, ACL-HLT 2011; 23-24 June 2011. 2011, Portland: Association for Computational Linguistics, 114-123. Hu ZZ, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH: Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics. 2005, 21 (11): 2759-2765. 10.1093/bioinformatics/bti390. Narayanaswamy M, Ravikumar KE, Vijay-Shanker K: Beyond the clause: extraction of phosphorylation information from medline abstracts. Bioinformatics. 2005, 21 (Suppl 1): i319-i327. 10.1093/bioinformatics/bti1011. Arighi CN, Siu AY, Tudor CO, Nchoutmboube JA, Wu CH, Shanker VK: eFIP: a tool for mining functional impact of phosphorylation from literature. Methods mol biol. 2011, 694: 63-75. 10.1007/978-1-60761-977-2_5. Xu Y, Teng D, Lei Y: MinePhos: a literature mining system for protein phoshphorylation information extraction. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012, 9: 311-315. Farriol-Mathis N, Garavelli JS, Boeckmann B, Duvaud S, Gasteiger E, Gateau A, Veuthey AL, Bairoch A: Annotation of post-translational modifications in the swiss-prot knowledge base. Proteomics. 2004, 4 (6): 1537-1550. 10.1002/pmic.200300764. Grimsrud PA, Swaney DL, Wenger CD, Beauchene NA, Coon JJ: Phosphoproteomics for the masses. ACS Chem Biol. 2010, 5 (1): 105-119. 10.1021/cb900277e. Leroy G, Chen H: Filling preposition-based templates to capture information from medical abstracts. Proceedings of the Pacific Symposium on Biocomputing; Hawaii. 2002, 7: 350-361. Leroy G, Chen H, Martinez JD: A shallow parser based on closed-class words to capture relations in biomedical text. J biomed infor. 2003, 36 (3): 145-158. 10.1016/S1532-0464(03)00039-X. Hsu CN, Chang YM, Kuo CJ, Lin YS, Huang HS, Chung IF: Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics. 2008, 24 (13): i286-i294. 10.1093/bioinformatics/btn183. Smith L, Tanabe LK, Ando RJ, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich CM, Ganchev K: Overview of BioCreative II gene mention recognition. Genome Biol. 2008, 9 (Suppl 2): S2-10.1186/gb-2008-9-s2-s2. Ohta T, Pyysalo S, Miwa M, Kim JD, Tsujii J: Event Extraction for Post-Translational Modifications. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, ACL 2010; 15 July 2010. 2010, Uppsala: Association for Computational Linguistics, 19-27. Sayers E: Entrez programming utilities help. 2010, Bethesda (MD): National Center for Biotechnology Information (US) McEntyre JR, Ananiadou S, Andrews S, Black WJ, Boulderstone R, Buttery P, Chaplin D, Chevuru S, Cobley N, Coleman LA: UKPMC: a full text article resource for the life sciences. Nucleic Acids Res. 2011, 39 (Database issue): D58-D65. Magrane M, Consortium U: UniProt Knowledgebase: a hub of integrated protein data. Database j biol databases curation. 2011, 2011: bar009- Yip YL, Lachenal N, Pillet V, Veuthey AL: Retrieving mutation-specific information for human proteins in UniProt/Swiss-Prot Knowledgebase. J Bioinforma Comput Biol. 2007, 5 (6): 1215-1231. 10.1142/S021972000700320X. Lu Z, Kao HY, Wei CH, Huang M, Liu J, Kuo CJ, Hsu CN, Tsai RT, Dai HJ, Okazaki N: The gene normalization task in BioCreative III. BMC Bioinforma. 2011, 12 (Suppl 8): S2-10.1186/1471-2105-12-S8-S2. Dealemans W, Zavrel J, Berk P, Gillis S: MBT: A memory-based part of speech tagger-generator. Proceedings of the 4th Workshop On Very Large Corpora; 4 August 1996. Edited by: Ejerhed A, Dagan I. 1996, Copenhagen, 14-27. Montecchi-Palazzi L, Beavis R, Binz PA, Chalkley RJ, Cottrell J, Creasy D, Shofstahl J, Seymour SL, Garavelli JS: The PSI-MOD community standard for representation of protein modification data. Nat Biotechnol. 2008, 26 (8): 864-866. 10.1038/nbt0808-864. Phan IQ, Pilbout SF, Fleischmann W, Bairoch A: NEWT, a new taxonomy portal. Nucleic Acids Res. 2003, 31 (13): 3822-3823. 10.1093/nar/gkg516. Pillet V, Zehnder M, Seewald AK, Veuthey AL, Petrak J: GPSDB: a new database for synonyms expansion of gene and protein names. Bioinformatics. 2005, 21 (8): 1743-1744. 10.1093/bioinformatics/bti235.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA