A novel hybrid gene prediction method employing protein multiple sequence alignments

Bioinformatics - Tập 27 Số 6 - Trang 757-763 - 2011

Oliver Keller¹, Martin Kollmar¹, Mario Stanke¹, Stephan Waack¹

¹1 Institute of Computer Science, University of Göttingen, Goldschmidtstrasse 7, 2Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen and 3Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany

Tóm tắt

AbstractMotivation: As improved DNA sequencing techniques have increased enormously the speed of producing new eukaryotic genome assemblies, the further development of automated gene prediction methods continues to be essential.While the classification of proteins into families is a task heavily relying on correct gene predictions, it can at the same time provide a source of additional information for the prediction, complementary to those presently used.Results: We extended the gene prediction software AUGUSTUS by a method that employs block profiles generated from multiple sequence alignments as a protein signature to improve the accuracy of the prediction. Equipped with profiles modelling human dynein heavy chain (DHC) proteins and other families, AUGUSTUS was run on the genomic sequences known to contain members of these families. Compared with AUGUSTUS' ab initio version, the rate of genes predicted with high accuracy showed a dramatic increase.Availability: The AUGUSTUS project web page is located at http://augustus.gobics.de, with the executable program as well as the source code available for download.Contact: [email protected]; [email protected]Supplementary information: Supplementary data are available at Bioinformatics online.

Từ khóa

Tài liệu tham khảo

Attwood, 1994, Prints–a protein motif fingerprint database, Protein Eng., 7, 841, 10.1093/protein/7.7.841

Attwood, 2003, Prints and its automatic supplement, preprints, Nucleic Acids Res., 31, 400, 10.1093/nar/gkg030

Birney, 2004, Genewise and genomewise, Genome Res., 14, 988, 10.1101/gr.1865504

Castellana, 2008, Discovery and revision of arabidopsis genes by proteogenomics, Proc. Natl Acad. Sci. USA, 105, 21034, 10.1073/pnas.0811066106

Cui, 2007, Homology search for genes, Bioinformatics, 23, i97, 10.1093/bioinformatics/btm225

Harrow, 2009, Identifying protein-coding genes in genomic sequences, Genome Biol., 10, 201, 10.1186/gb-2009-10-1-201

Henikoff, 1991, Automated assembly of protein blocks for database searching, Nucleic Acids Res., 19, 6565, 10.1093/nar/19.23.6565

Henikoff, 1990, Finding protein similarities with nucleotide sequence databases, Methods Enzymol., 183, 111, 10.1016/0076-6879(90)83009-X

Henikoff, 1999, Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations, Bioinformatics, 15, 471, 10.1093/bioinformatics/15.6.471

Hunter, 2009, Interpro: the integrative protein signature database, Nucleic Acids Res., 37, D211, 10.1093/nar/gkn785

Keller, 2008, Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species, BMC Bioinformatics, 9, 278, 10.1186/1471-2105-9-278

Kent, 2002, Blat–the blast-like alignment tool, Genome Res., 12, 656

Metzker, 2010, Sequencing technologies - the next generation, Nat. Rev. Genet., 11, 31, 10.1038/nrg2626

Meyer, 2004, Gene structure conservation aids similarity based gene prediction, Nucleic Acids Res., 32, 776, 10.1093/nar/gkh211

Odronitz, 2006, Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (cymobase), BMC Genomics, 7, 300, 10.1186/1471-2164-7-300

Odronitz, 2008, Webscipio: An online tool for the determination of gene structures using protein sequences, BMC Genomics, 9, 422, 10.1186/1471-2164-9-422

Pietrokovski, 1996, The blocks database–a system for protein classification, Nucleic Acids Res., 24, 197, 10.1093/nar/24.1.197

Quevillon, 2005, Interproscan: protein domains identifier, Nucleic Acids Res., 33, W116, 10.1093/nar/gki442

Slater, 2005, Automated generation of heuristics for biological sequence comparison, BMC Bioinformatics, 6, 31, 10.1186/1471-2105-6-31

Stanke, 2003, Gene prediction with a hidden Markov model and a new intron submodel, Bioinformatics, 19, 215, 10.1093/bioinformatics/btg1080

Stanke, 2006, Augustus at egasp: using est, protein and genomic alignments for improved gene prediction in the human genome, Genome Biol., 7, 1

Stanke, 2006, Gene prediction in eukaryotes with a generalized hidden markov model that uses hints from external sources, BMC Bioinformatics, 7, 62, 10.1186/1471-2105-7-62

Stanke, 2008, Using native and syntenically mapped cdna alignments to improve de novo gene finding, Bioinformatics, 24, 637, 10.1093/bioinformatics/btn013

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA