InterProScan 5: genome-scale protein function classification

Bioinformatics - Tập 30 Số 9 - Trang 1236-1240 - 2014
Philip Jones1, David Binns1, Hsin-Yu Chang1, Matthew Fraser1, Weizhong Li1, Craig McAnulla1, Hamish McWilliam1, John Maslen1, Alex Mitchell1, Gift Nuka1, Sebastien Pesseat1, A. F. Quinn1, Amaia Sangrador‐Vegas1, Maxim Scheremetjew1, Siew-Yit Yong1, Rodrigo López1, Sarah Hunter1
11 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK

Tóm tắt

Abstract Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code. Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact:  http://www.ebi.ac.uk/support or [email protected] or [email protected]

Từ khóa


Tài liệu tham khảo

Altschul, 1990, Basic local alignment search tool, J. Mol. Biol., 215, 403, 10.1016/S0022-2836(05)80360-2

Attwood, 2012, The PRINTS database: a fine-grained protein sequence annotation and analysis resource–its status in 2012, Database, 2012, bas019, 10.1093/database/bas019

Bairoch, 2000, The ENZYME database in 2000, Nucleic Acids Res., 28, 304, 10.1093/nar/28.1.304

Bru, 2005, The ProDom database of protein domain families: more emphasis on 3D, Nucleic Acids Res., 33, D212, 10.1093/nar/gki034

Eddy, 2009, A new generation of homology search tools based on probabilistic inference, Genome Inform., 23, 205

Goujon, 2010, A new bioinformatics analysis tools framework at EMBL-EBI, Nucleic Acids Res., 38, W695, 10.1093/nar/gkq313

Haft, 2012, TIGRFAMs and Genome Properties in 2013, Nucleic Acids Res., 40, D387, 10.1093/nar/gks1234

Hunter, 2012, InterPro in 2011: new developments in the family and domain prediction database, Nucleic Acids Res., 40, D306, 10.1093/nar/gkr948

Käll, 2004, A combined transmembrane topology and signal peptide prediction method, J. Mol. Biol., 338, 1027, 10.1016/j.jmb.2004.03.016

Kanehisa, 2013, Molecular network analysis of diseases and drugs in KEGG, Methods Mol. Biol., 939, 263, 10.1007/978-1-62703-107-3_17

Krogh, 2001, Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, J. Mol. Biol., 305, 567, 10.1006/jmbi.2000.4315

Lees, 2012, Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis, Nucleic Acids Res., 40, D465, 10.1093/nar/gkr1181

Letunic, 2012, SMART 7: recent updates to the protein domain annotation resource, Nucleic Acids Res., 40, D302, 10.1093/nar/gkr931

De Lima Morais, 2011, SUPERFAMILY 1.75 including a domain-centric gene ontology method, Nucleic Acids Res., 39, D427, 10.1093/nar/gkq1130

Mi, 2012, PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees, Nucleic Acids Res., 41, D377, 10.1093/nar/gks1118

Pedruzzi, 2012, HAMAP in 2013, new developments in the protein family classification and annotation system, Nucleic Acids Res., 41, D584, 10.1093/nar/gks1157

Petersen, 2011, SignalP 4.0: discriminating signal peptides from transmembrane regions, Nat. Methods, 8, 785, 10.1038/nmeth.1701

Punta, 2012, The Pfam protein families database, Nucleic Acids Res., 40, D290, 10.1093/nar/gkr1065

Quevillon, 2005, InterProScan: protein domains identifier, Nucleic Acids Research, 33, W116, 10.1093/nar/gki442

Sato, 2011, Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L, DNA Res., 18, 65, 10.1093/dnares/dsq030

Shulaev, 2011, The genome of woodland strawberry (Fragaria vesca), Nat. Genet., 43, 109, 10.1038/ng.740

Sigrist, 2012, New and continuing developments at PROSITE, Nucleic Acids Res., 41, D344, 10.1093/nar/gks1067

Suen, 2011, The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle, PLoS Genet., 7, e1002007, 10.1371/journal.pgen.1002007

The Gene Ontology Consortium, 2000, Gene Ontology: tool for the unification of biology, Nat. Genet., 25, 25, 10.1038/75556

The UniProt Consortium, 2012, Reorganizing the protein space at the Universal Protein Resource (UniProt), Nucleic Acids Res., 40, D71, 10.1093/nar/gkr981

Wu, 2004, PIRSF: family classification system at the Protein Information Resource, Nucleic Acids Res., 32, D112, 10.1093/nar/gkh097