InterProScan 5: genome-scale protein function classification
Tóm tắt
Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.
Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/.
Contact: http://www.ebi.ac.uk/support or [email protected] or [email protected]
Từ khóa
Tài liệu tham khảo
Altschul, 1990, Basic local alignment search tool, J. Mol. Biol., 215, 403, 10.1016/S0022-2836(05)80360-2
Attwood, 2012, The PRINTS database: a fine-grained protein sequence annotation and analysis resource–its status in 2012, Database, 2012, bas019, 10.1093/database/bas019
Bru, 2005, The ProDom database of protein domain families: more emphasis on 3D, Nucleic Acids Res., 33, D212, 10.1093/nar/gki034
Eddy, 2009, A new generation of homology search tools based on probabilistic inference, Genome Inform., 23, 205
Goujon, 2010, A new bioinformatics analysis tools framework at EMBL-EBI, Nucleic Acids Res., 38, W695, 10.1093/nar/gkq313
Haft, 2012, TIGRFAMs and Genome Properties in 2013, Nucleic Acids Res., 40, D387, 10.1093/nar/gks1234
Hunter, 2012, InterPro in 2011: new developments in the family and domain prediction database, Nucleic Acids Res., 40, D306, 10.1093/nar/gkr948
Käll, 2004, A combined transmembrane topology and signal peptide prediction method, J. Mol. Biol., 338, 1027, 10.1016/j.jmb.2004.03.016
Kanehisa, 2013, Molecular network analysis of diseases and drugs in KEGG, Methods Mol. Biol., 939, 263, 10.1007/978-1-62703-107-3_17
Krogh, 2001, Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, J. Mol. Biol., 305, 567, 10.1006/jmbi.2000.4315
Lees, 2012, Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis, Nucleic Acids Res., 40, D465, 10.1093/nar/gkr1181
Letunic, 2012, SMART 7: recent updates to the protein domain annotation resource, Nucleic Acids Res., 40, D302, 10.1093/nar/gkr931
De Lima Morais, 2011, SUPERFAMILY 1.75 including a domain-centric gene ontology method, Nucleic Acids Res., 39, D427, 10.1093/nar/gkq1130
Mi, 2012, PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees, Nucleic Acids Res., 41, D377, 10.1093/nar/gks1118
Pedruzzi, 2012, HAMAP in 2013, new developments in the protein family classification and annotation system, Nucleic Acids Res., 41, D584, 10.1093/nar/gks1157
Petersen, 2011, SignalP 4.0: discriminating signal peptides from transmembrane regions, Nat. Methods, 8, 785, 10.1038/nmeth.1701
Quevillon, 2005, InterProScan: protein domains identifier, Nucleic Acids Research, 33, W116, 10.1093/nar/gki442
Sato, 2011, Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L, DNA Res., 18, 65, 10.1093/dnares/dsq030
Shulaev, 2011, The genome of woodland strawberry (Fragaria vesca), Nat. Genet., 43, 109, 10.1038/ng.740
Sigrist, 2012, New and continuing developments at PROSITE, Nucleic Acids Res., 41, D344, 10.1093/nar/gks1067
Suen, 2011, The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle, PLoS Genet., 7, e1002007, 10.1371/journal.pgen.1002007
The Gene Ontology Consortium, 2000, Gene Ontology: tool for the unification of biology, Nat. Genet., 25, 25, 10.1038/75556
The UniProt Consortium, 2012, Reorganizing the protein space at the Universal Protein Resource (UniProt), Nucleic Acids Res., 40, D71, 10.1093/nar/gkr981