InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Nucleic Acids Research - Tập 47 Số D1 - Trang D351-D360 - 2019
Alex Mitchell1, Teresa K. Attwood2, Patricia C. Babbitt3, Matthias Blum1, Peer Bork4, Alan Bridge5, Shoshana Brown3, Hsin-Yu Chang1, Sara El-Gebali1, Matthew Fraser1, Julian Gough6, David R Haft7, Hongzhan Huang8, Ivica Letunić9, Rodrigo López1, Aurélien Luciani1, Fábio Madeira1, Aron Marchler‐Bauer10, Huaiyu Mi11, Darren A. Natale12, Marco Necci13,14,15, Gift Nuka1, Christine Orengo16, Arun Prasad Pandurangan6, Typhaine Paysan-Lafosse1, Sebastien Pesseat1, Simon Potter1, Matloob Qureshi1, Neil D. Rawlings1, Nicole Redaschi5, Lorna Richardson1, Catherine Rivoire5, Gustavo A Salazar1, Amaia Sangrador‐Vegas1, Christian Sigrist5, Ian Sillitoe16, Granger G. Sutton7, Narmada Thanki10, Paul D Thomas11, Silvio C. E. Tosatto14, Siew-Yit Yong1, ROBERT FINN1
1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
2School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
3Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
4European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
5Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
6Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
7J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
8Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
9biobyte solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
10National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
11Division of Bioinformatics, Department of Preventive Medicine, University of Southern California , Los Angeles , CA 90033, USA
12Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
13Department of Agricultural Sciences, University of Udine, via Palladio 8, 33100 Udine, Italy
14Department of Biomedical Sciences, University of Padua, Via U. Bassi 58/B, 35131 Padua, Italy
15Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all’Adige, Italy
16Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

The UniProt Consortium, 2017, UniProt: the universal protein knowledgebase, Nucleic Acids Res., 45, D158, 10.1093/nar/gkw1099

Lewis, 2018, Gene3D: extensive prediction of globular domains in proteins, Nucleic Acids Res., 46, D435, 10.1093/nar/gkx1069

Marchler-Bauer, 2017, CDD/SPARCLE: functional classification of proteins via subfamily domain architectures, Nucleic Acids Res., 45, D200, 10.1093/nar/gkw1129

Pedruzzi, 2015, HAMAP in 2015: updates to the protein family classification and annotation system, Nucleic Acids Res., 43, D1064, 10.1093/nar/gku1002

Mi, 2017, PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements, Nucleic Acids Res., 45, D183, 10.1093/nar/gkw1138

Finn, 2016, The Pfam protein families database: towards a more sustainable future, Nucleic Acids Res., 44, D279, 10.1093/nar/gkv1344

Nikolskaya, 2007, PIRSF family classification system for protein functional and evolutionary analysis, Evol. Bioinform. Online, 2, 197

Attwood, 2012, The PRINTS database: a fine-grained protein sequence annotation and analysis resource–its status in 2012, Database (Oxford), 2012, bas019, 10.1093/database/bas019

Bru, 2005, The ProDom database of protein domain families: more emphasis on 3D, Nucleic Acids Res., 33, D212, 10.1093/nar/gki034

Sigrist, 2013, New and continuing developments at PROSITE, Nucleic Acids Res., 41, D344, 10.1093/nar/gks1067

Letunic, 2018, 20 years of the SMART protein domain annotation resource, Nucleic Acids Res., 46, D493, 10.1093/nar/gkx922

Akiva, 2014, The Structure-Function Linkage Database, Nucleic Acids Res., 42, D521, 10.1093/nar/gkt1130

Oates, 2015, The SUPERFAMILY 1.75 database in 2014: a doubling of data, Nucleic Acids Res., 43, D227, 10.1093/nar/gku1041

Haft, 2013, TIGRFAMs and genome properties in 2013, Nucleic Acids Res., 41, D387, 10.1093/nar/gks1234

Piovesan, 2018, MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins, Nucleic Acids Res., 46, D471, 10.1093/nar/gkx1071

Nielsen, 2017, Predicting secretory proteins with SignalP, Methods Mol. Biol., 1611, 59, 10.1007/978-1-4939-7015-5_6

Käll, 2007, Advantages of combined transmembrane topology and signal peptide prediction–the Phobius web server, Nucleic Acids Res., 35, W429, 10.1093/nar/gkm256

Krogh, 2001, Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, J. Mol. Biol., 305, 567, 10.1006/jmbi.2000.4315

Lupas, 1991, Predicting coiled coils from protein sequences, Science, 252, 1162, 10.1126/science.252.5009.1162

Durinx, 2017, Identifying ELIXIR Core Data Resources. [version 2; referees: 2 approved], F1000Res, 5, 2422, 10.12688/f1000research.9656.2

Aken, 2016, The Ensembl gene annotation system, Database (Oxford), 2016, baw093, 10.1093/database/baw093

Kersey, 2016, Ensembl Genomes 2016: more genomes, more complexity, Nucleic Acids Res., 44, D574, 10.1093/nar/gkv1209

Mir, 2018, PDBe: towards reusable data delivery infrastructure at protein data bank in Europe, Nucleic Acids Res., 46, D486, 10.1093/nar/gkx1070

Conesa, 2008, Blast2GO: a comprehensive suite for functional analysis in plant genomics, Int. J. Plant Genomics, 2008, 619832, 10.1155/2008/619832

Pedro, 2016, PhytoPath: an integrative resource for plant pathogen genomics, Nucleic Acids Res., 44, D688, 10.1093/nar/gkv1052

Huson, 2016, MEGAN Community edition - interactive exploration and analysis of Large-Scale microbiome sequencing data, PLoS Comput. Biol., 12, e1004957, 10.1371/journal.pcbi.1004957

Mitchell, 2018, EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies, Nucleic Acids Res., 46, D726, 10.1093/nar/gkx967

Jones, 2014, InterProScan 5: genome-scale protein function classification, Bioinformatics, 30, 1236, 10.1093/bioinformatics/btu031

Ashburner, 2000, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet., 25, 25, 10.1038/75556

Sangrador-Vegas, 2016, GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations, Database (Oxford), 2016, baw027, 10.1093/database/baw027

Finn, 2017, InterPro in 2017-beyond protein family and domain annotations, Nucleic Acids Res., 45, D190, 10.1093/nar/gkw1107

Velankar, 2013, SIFTS: Structure Integration with Function, Taxonomy and Sequences resource, Nucleic Acids Res., 41, D483, 10.1093/nar/gks1258

Watkins, 2017, ProtVista: visualization of protein sequence annotations, Bioinformatics, 33, 2040, 10.1093/bioinformatics/btx120

Pravda, 2018, MOLEonline: a web-based tool for analyzing channels, tunnels and pores (2018 update), Nucleic Acids Res., 46, W368, 10.1093/nar/gky309

Potter, 2018, HMMER web server: 2018 update, Nucleic Acids Res., 46, W200, 10.1093/nar/gky448

Cesare, 2012, Software Similarity and Classification, 10.1007/978-1-4471-2909-7

Das, 2013, Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues, Proc. Natl. Acad. Sci. U.S.A., 110, 13392, 10.1073/pnas.1304749110

Holehouse, 2017, CIDER: resources to analyze sequence-ensemble relationships of intrinsically disordered proteins, Biophys. J., 112, 16, 10.1016/j.bpj.2016.11.3200

Das, 2015, Relating sequence encoded information to form and function of intrinsically disordered proteins, Curr. Opin. Struct. Biol., 32, 102, 10.1016/j.sbi.2015.03.008

Necci, 2016, Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe, Protein Sci., 25, 2164, 10.1002/pro.3041