Ab initio gene identification in metagenomic sequences

Nucleic Acids Research - Tập 38 Số 12 - Trang e132-e132 - 2010
Wenhan Zhu1, Alexandre Lomsadze2, Mark Borodovsky2
1School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA#TAB#
2School of Biology, 2 Wallace H. Coulter Department of Biomedical Engineering, 3 School of Computational Science and Engineering and 4 Center for Bioinformatics and Computational Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Chen, 2005, Bioinformatics for whole-genome shotgun sequencing of microbial communities, PLoS Comput. Biol., 1, 106, 10.1371/journal.pcbi.0010024

Venter, 2004, Environmental genome shotgun sequencing of the Sargasso Sea, Science, 304, 66, 10.1126/science.1093857

Krause, 2006, Finding novel genes in bacterial communities isolated from the environment, Bioinformatics, 22, e281, 10.1093/bioinformatics/btl247

Yooseph, 2007, The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families, PLoS Biol., 5, e16, 10.1371/journal.pbio.0050016

Yooseph, 2008, Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering, BMC Bioinformatics, 9, 182, 10.1186/1471-2105-9-182

Larsen, 2003, EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance, BMC Bioinformatics, 4, 15, 10.1186/1471-2105-4-21

Besemer, 2001, GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions, Nucleic Acids Res., 29, 2607, 10.1093/nar/29.12.2607

Delcher, 2007, Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics, 23, 673, 10.1093/bioinformatics/btm009

Besemer, 1999, Heuristic approach to deriving models for gene finding, Nucleic Acids Res., 27, 3911, 10.1093/nar/27.19.3911

Mills, 2003, Improving gene annotation of complete viral genomes, Nucleic Acids Res., 31, 7041, 10.1093/nar/gkg878

Lomsadze, 2005, Gene identification in novel eukaryotic genomes by self-training algorithm, Nucleic Acids Res., 33, 6494, 10.1093/nar/gki937

Ter-Hovhannisyan, 2008, Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training, Genome Res., 18, 1979, 10.1101/gr.081612.108

Noguchi, 2008, MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes, DNA Res., 15, 387, 10.1093/dnares/dsn027

Noguchi, 2006, MetaGene: prokaryotic gene finding from environmental genome shotgun sequences, Nucleic Acids Res., 34, 5623, 10.1093/nar/gkl723

Hoff, 2008, Gene prediction in metagenomic fragments: a large scale machine learning approach, BMC Bioinformatics, 9, 217, 10.1186/1471-2105-9-217

Hoff, 2009, Orphelia: predicting genes in metagenomic sequencing reads, Nucleic Acids Res., 37, W101, 10.1093/nar/gkp327

Rudner, 1968, Separation of B. subtilis DNA into complementary strands. 3. Direct analysis, Proc. Natl Acad. Sci. USA, 60, 921, 10.1073/pnas.60.3.921

Kattenhorn, 2004, Identification of proteins associated with murine cytomegalovirus virions, J. Virol., 78, 11187, 10.1128/JVI.78.20.11187-11197.2004

Gill, 2006, Metagenomic analysis of the human distal gut microbiome, Science, 312, 1355, 10.1126/science.1124234

Turnbaugh, 2006, An obesity-associated gut microbiome with increased capacity for energy harvest, Nature, 444, 1027, 10.1038/nature05414

Mardis, 2008, The impact of next-generation sequencing technology on genetics, Trends Genet., 24, 133, 10.1016/j.tig.2007.12.007

Randau, 2005, Nanoarchaeum equitans creates functional tRNAs from separate genes for their 5′- and 3′-halves, Nature, 433, 537, 10.1038/nature03233

Sayers, 2009, Database resources of the National Center for Biotechnology Information, Nucleic Acids Res., 37, D5, 10.1093/nar/gkn741

Markowitz, 2008, IMG/M: a data management and analysis system for metagenomes, Nucleic Acids Res., 36, D534, 10.1093/nar/gkm869

Borodovsky, 1993, Genmark – parallel gene recognition for both DNA strands, Comput. Chem., 17, 123, 10.1016/0097-8485(93)85004-V

Azad, 2004, Effects of choice of DNA sequence model structure on gene identification accuracy, Bioinformatics, 20, 993, 10.1093/bioinformatics/bth028

Knight, 2001, A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes, Genome Biol., 2, 10.1186/gb-2001-2-4-research0010

Chen, 2004, Codon usage between genomes is constrained by genome-wide mutational processes, Proc. Natl Acad. Sci. USA, 101, 3480, 10.1073/pnas.0307827100

Gorban, 2007, The mystery of two straight lines in bacterial genome statistics, Bull. Math. Biol., 69, 2429, 10.1007/s11538-007-9229-6

Lukashin, 1998, GeneMark.hmm: new solutions for gene finding, Nucleic Acids Res., 26, 1107, 10.1093/nar/26.4.1107

Lobry, 2006, Synonymous codon usage and its potential link with optimal growth temperature in prokaryotes, Gene, 385, 128, 10.1016/j.gene.2006.05.033

Nelson, 1999, Evidence for lateral gene transfer between archaea and bacteria from genome sequence of Thermotoga maritima, Nature, 399, 323, 10.1038/20601

Zavala, 2002, Trends in codon and amino acid usage in Thermotoga maritima, J. Mol. Evol., 54, 563, 10.1007/s00239-001-0040-y

Basak, 2004, Investigation on the causes of codon and amino acid usages variation between thermophilic Aquifex aeolicus and mesophilic Bacillus subtilis, J. Biomol. Struct. Dyn., 22, 205, 10.1080/07391102.2004.10506996

Stein, 2002, The generic genome browser: a building block for a model organism system database, Genome Res., 12, 1599, 10.1101/gr.403602

Hoff, 2009, The effect of sequencing errors on metagenomic gene prediction, BMC Genomics, 10, 520, 10.1186/1471-2164-10-520

Antonov, 2010, GeneTack: Frameshift identification in protein coding sequences by the Viterbi algorithm, J. Bioinform. Comput. Biol., 8, 1, 10.1142/S0219720010004847

Tech, 2003, YACOP: enhanced gene prediction obtained by a combination of existing methods, In Silico Biol., 3, 441