Metagenomics - Hướng dẫn từ việc lấy mẫu đến phân tích dữ liệu

Springer Science and Business Media LLC - Tập 2 - Trang 1-12 - 2012
Torsten Thomas1, Jack Gilbert2,3, Folker Meyer2,4
1School of Biotechnology and Biomolecular Sciences & Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, Australia
2Argonne National Laboratory, Argonne, USA
3Department of Ecology and Evolution, University of Chicago, Chicago, USA
4Computation Institute, University of Chicago, Chicago, USA

Tóm tắt

Metagenomics áp dụng một loạt công nghệ gen và công cụ tin sinh học để truy cập trực tiếp vào nội dung di truyền của toàn bộ cộng đồng sinh vật. Lĩnh vực metagenomics đã đóng vai trò quan trọng trong việc thúc đẩy tiến bộ đáng kể trong sinh thái vi sinh vật, tiến hóa và sự đa dạng trong 5 đến 10 năm qua, và hiện nay nhiều phòng thí nghiệm nghiên cứu đang tích cực tham gia vào lĩnh vực này. Sự gia tăng các hoạt động cũng dẫn đến một lượng lớn kiến thức và chuyên môn phương pháp, điều này cần thiết để hướng dẫn các phát triển trong tương lai của lĩnh vực. Bài tổng quan này tóm tắt các ý kiến hiện tại trong metagenomics, và cung cấp hướng dẫn thực tế cũng như lời khuyên về quá trình xử lý mẫu, công nghệ giải trình tự, lắp ráp, phân loại, chú thích, thiết kế thí nghiệm, phân tích thống kê, lưu trữ dữ liệu và chia sẻ dữ liệu. Khi ngày càng nhiều tập dữ liệu metagenomic được tạo ra, việc có sẵn các quy trình chuẩn hóa, cùng với việc lưu trữ và phân tích dữ liệu chung, trở nên quan trọng hơn để đảm bảo rằng kết quả của các dự án riêng lẻ có thể được đánh giá và so sánh.

Từ khóa

#metagenomics #sinh thái vi sinh vật #đa dạng sinh học #công nghệ giải trình tự #phân tích thống kê

Tài liệu tham khảo

Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM: Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol. 1998, 5 (10): R245-249. 10.1016/S1074-5521(98)90108-9. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004, 428 (6978): 37-43. 10.1038/nature02340. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004, 304 (5667): 66-74. 10.1126/science.1093857. Simon C, Daniel R: Metagenomic analyses: past and future trends. Appl Environ Microbiol. 2011, 77 (4): 1153-1161. 10.1128/AEM.02345-10. Wilmes P, Bond PL: Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol. 2006, 14 (2): 92-97. 10.1016/j.tim.2005.12.006. Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I: Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One. 2008, 3 (8): e3042-10.1371/journal.pone.0003042. Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, Jovanovich SB, Gates CM, Feldman RA, Spudich JL, Spudich EN, DeLong EF: Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science. 2000, 289 (5486): 1902-1906. 10.1126/science.289.5486.1902. Nicol GW, Schleper C: Ammonia-oxidising Crenarchaeota: important players in the nitrogen cycle?. Trends Microbiol. 2006, 14 (5): 207-212. 10.1016/j.tim.2006.03.004. Burke C, Kjelleberg S, Thomas T: Selective extraction of bacterial DNA from the surfaces of macroalgae. Appl Environ Microbiol. 2009, 75 (1): 252-256. 10.1128/AEM.01630-08. Delmont TO, Robe P, Clark I, Simonet P, Vogel TM: Metagenomic comparison of direct and indirect soil DNA extraction approaches. J Microbiol Methods. 2011, 86 (3): 397-400. 10.1016/j.mimet.2011.06.013. Knight R, Desai N, Field D, Fierer N, Fuhrman J, Gordon J, Hu B, Hugenholtz P, Jansson J, Meyer F, Stevens R, Bailey M, Kowalchuk G, Gilbert J: Designing Better Metagenomic Surveys: The role of experimental design and metadata capture in making useful metagenomic datasets for ecology and biotechnology. Nature Biotechnology. in review Thomas T, Rusch D, DeMaere MZ, Yung PY, Lewis M, Halpern A, Heidelberg KB, Egan S, Steinberg PD, Kjelleberg S: Functional genomic signatures of sponge bacteria reveal unique and shared features of symbiosis. ISME J. 2010, 4 (12): 1557-1567. 10.1038/ismej.2010.74. Palenik B, Ren Q, Tai V, Paulsen IT: Coastal Synechococcus metagenome reveals major roles for horizontal gene transfer and plasmids in population diversity. Environ Microbiol. 2009, 11 (2): 349-359. 10.1111/j.1462-2920.2008.01772.x. Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, Carlson C, Chan AM, Haynes M, Kelley S, Liu H, Mahaffy JM, Mueller JE, Nulton J, Olson R, Parsons R, Rayhawk S, Suttle CA, Rohwer F: The marine viromes of four oceanic regions. PLoS Biol. 2006, 4 (11): e368-10.1371/journal.pbio.0040368. Abbai NS, Govender A, Shaik R, Pillay B: Pyrosequence analysis of unamplified and whole genome amplified DNA from hydrocarbon-contaminated groundwater. Mol Biotechnol. 2011 Lasken RS: Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochem Soc Trans. 2009, 37 (Pt 2): 450-453. Ishoey T, Woyke T, Stepanauskas R, Novotny M, Lasken RS: Genomic sequencing of single microbial cells from environmental samples. Curr Opin Microbiol. 2008, 11 (3): 198-204. 10.1016/j.mib.2008.05.006. Goltsman DS, Denef VJ, Singer SW, VerBerkmoes NC, Lefsrud M, Mueller RS, Dick GJ, Sun CL, Wheeler KE, Zemla A, Baker BJ, Hauser L, Land M, Shah MB, Thelen MP, Hettich RL, Banfield JF: Community genomic and proteomic analyses of chemoautotrophic iron-oxidizing "Leptospirillum rubarum" (Group II) and " Leptospirillum ferrodiazotrophum" (Group III) bacteria in acid mine drainage biofilms. Appl Environ Microbiol. 2009, 75 (13): 4599-4615. 10.1128/AEM.02943-08. Sorek R, Zhu Y, Creevey CJ, Francino MP, Bork P, Rubin EM: Genome-wide experimental determination of barriers to horizontal gene transfer. Science. 2007, 318 (5855): 1449-1452. 10.1126/science.1147112. Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet. 2010, 11 (1): 31-46. 10.1038/nrg2626. Mardis ER: The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24 (3): 133-141. 10.1016/j.tig.2007.12.007. Niu B, Fu L, Sun S, Li W: Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics. 2010, 11: 187-10.1186/1471-2105-11-187. Teal TK, Schmidt TM: Identifying and removing artificial replicates from 454 pyrosequencing data. Cold Spring Harb Protoc. 2010, 2010 (4): pdb prot5409- Rho M, Tang H, Ye Y: FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010, 38 (20): e191-10.1093/nar/gkq747. Wommack KE, Bhavsar J, Ravel J: Metagenomics: read length matters. Appl Environ Microbiol. 2008, 74 (5): 1453-1463. 10.1128/AEM.02181-07. White RA, Blainey PC, Fan HC, Quake SR: Digital PCR provides sensitive and absolute calibration for high throughput sequencing. BMC Genomics. 2009, 10: 116-10.1186/1471-2164-10-116. Adey A, Morrison HG, Asan Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, Shendure J: Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 2010, 11 (12): R119-10.1186/gb-2010-11-12-r119. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038/nature07517. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S: Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011, 39 (13): e90-10.1093/nar/gkr344. Hess M, Sczyrba A, Egan R, Kim TW, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, Mackie RI, Pennacchio LA, Tringe SG, Visel A, Woyke T, Wang Z, Rubin EM: Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science. 2011, 331 (6016): 463-467. 10.1126/science.1200387. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010, 464 (7285): 59-65. 10.1038/nature08821. Gulig PA, de Crecy-Lagard V, Wright AC, Walts B, Telonis-Scott M, McIntyre LM: SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes. BMC Genomics. 2010, 11: 512-10.1186/1471-2164-11-512. Tyler HL, Roesch LF, Gowda S, Dawson WO, Triplett EW: Confirmation of the sequence of 'Candidatus Liberibacter asiaticus' and assessment of microbial diversity in Huanglongbing-infected citrus phloem using a metagenomic approach. Mol Plant Microbe Interact. 2009, 22 (12): 1624-1634. 10.1094/MPMI-22-12-1624. Kunin V, Raes J, Harris JK, Spear JR, Walker JJ, Ivanova N, von Mering C, Bebout BM, Pace NR, Bork P, Hugenholtz P: Millimeter-scale genetic gradients and community-level molecular convergence in a hypersaline microbial mat. Mol Syst Biol. 2008, 4: 198- Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F, Paxinos EE, Sebra R, Chin CS, Iliopoulos D, Klammer A, Peluso P, Lee L, Kislyuk AO, Bullard J, Kasarskis A, Wang S, Eid J, Rank D, Redman JC, Steyert SR, Frimodt-Moller J, Struve C, Petersen AM, Krogfelt KA, Nataro JP, Schadt EE, Waldor MK: Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med. 2011, 365 (8): 709-717. 10.1056/NEJMoa1106920. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G, Dahl F, Fernandez A, Staker B, Pant KP, Baccash J, Borcherding AP, Brownley A, Cedeno R, Chen L, Chernikoff D, Cheung A, Chirita R, Curson B, Ebert JC, Hacker CR, Hartlage R, Hauser B, Huang S, Jiang Y, Karpinchyk V, et al: Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 2010, 327 (5961): 78-81. 10.1126/science.1181498. Chevreux B, Wetter T, Suhai S: Genome Sequence Assembly Using Trace Signals and Additional Sequence Information Computer Science and Biology. Proceedings of the German Conference on Bioinformatics. 1999, 99: 45-56. Miller JR, Koren S, Sutton G: Assembly algorithms for next-generation sequencing data. Genomics. 2010, 95 (6): 315-327. 10.1016/j.ygeno.2010.03.001. Pevzner PA, Tang H, Waterman MS: An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA. 2001, 98 (17): 9748-9753. 10.1073/pnas.171285098. Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18 (5): 821-829. 10.1101/gr.074492.107. Li R, Li Y, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics. 2008, 24 (5): 713-714. 10.1093/bioinformatics/btn025. Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics. 2011, 27 (13): i94-101. 10.1093/bioinformatics/btr216. Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F: Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010, 2010 (1): pdb prot5368 McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I: Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007, 4 (1): 63-72. 10.1038/nmeth976. Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22 (13): 1658-1659. 10.1093/bioinformatics/btl158. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010, 26 (19): 2460-2461. 10.1093/bioinformatics/btq461. Chan CK, Hsu AL, Halgamuge SK, Tang SL: Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics. 2008, 9: 215-10.1186/1471-2105-9-215. Zheng H, Wu H: Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol. 2010, 8 (6): 995-1011. 10.1142/S0219720010005051. Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW: TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009, 10: 56-10.1186/1471-2105-10-56. Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen IM, Grechkin Y, Dubchak I, Anderson I, Lykidis A, Mavromatis K, Hugenholtz P, Kyrpides NC: IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res. 2008, D534-538. 36 Database Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res. 2007, 17 (3): 377-386. 10.1101/gr.5969107. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J: Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 2008, 36 (7): 2230-2239. 10.1093/nar/gkn038. Monzoorul Haque M, Ghosh TS, Komanduri D, Mande SS: SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics. 2009, 25 (14): 1722-1730. 10.1093/bioinformatics/btp317. Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M: Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011, 12 (Suppl 2): S4-10.1186/1471-2164-12-S2-S4. Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009, 6 (9): 673-676. 10.1038/nmeth.1358. Leung HC, Yiu SM, Yang B, Peng Y, Wang Y, Liu Z, Chen J, Qin J, Li R, Chin FY: A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics. 2011, 27 (11): 1489-1495. 10.1093/bioinformatics/btr186. Yung PY, Burke C, Lewis M, Egan S, Kjelleberg S, Thomas T: Phylogenetic screening of a bacterial, metagenomic library using homing endonuclease restriction and marker insertion. Nucleic Acids Res. 2009, 37 (21): e144-10.1093/nar/gkp746. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O: The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75-10.1186/1471-2164-9-75. Markowitz VM, Mavromatis K, Ivanova NN, Chen IM, Chu K, Kyrpides NC: IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics. 2009, 25 (17): 2271-2278. 10.1093/bioinformatics/btp393. Lukashin AV, Borodovsky M: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 1998, 26 (4): 1107-1115. 10.1093/nar/26.4.1107. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999, 27 (23): 4636-4641. 10.1093/nar/27.23.4636. McHardy ACZ, Wenhan Martin HGL, Alexandre Tsirigos A, Hugenholtz P, Rigoutsos IB, Mark : Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007, 4 (1): 63-72. 10.1038/nmeth976. Noguchi H, Taniguchi T, Itoh T: MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 2008, 15 (6): 387-396. 10.1093/dnares/dsn027. Hoff KJ, Lingner T, Meinicke P, Tech M: Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res. 2009, W101-105. 37 Web Server Yok NG, Rosen GL: Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics. 2011, 12: 20-10.1186/1471-2105-12-20. Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A: Rfam: updates to the RNA families database. Nucleic Acids Res. 2009, D136-140. 37 Database Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964. 10.1093/nar/25.5.955. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Molec Biol. 2004, 340 (4): 783-795. 10.1016/j.jmb.2004.05.028. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P: CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007, 8: 209-10.1186/1471-2105-8-209. Grissa I, Vergnaud G, Pourcel C: CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007, W52-57. 35 Web Server Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007, 35 (21): 7188-7196. 10.1093/nar/gkm864. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006, 72 (7): 5069-5072. 10.1128/AEM.03006-05. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM: The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009, D141-145. 37 Database Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, Wooley J: Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource. Nucleic Acids Res. 2011, D546-551. 39 Database Gilbert JA, Field D, Swift P, Thomas S, Cummings D, Temperton B, Weynberg K, Huse S, Hughes M, Joint I, Somerfield PJ, Muhling M: The taxonomic and functional diversity of microbes at a temperate coastal site: a 'multi-omic' study of seasonal and diel temporal variation. PLoS One. 2010, 5 (11): e15545-10.1371/journal.pone.0015545. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007, 5 (3): e16-10.1371/journal.pbio.0050016. Godzik A: Metagenomics and the protein universe. Curr Opin Struct Biol. 2011, 21 (3): 398-403. 10.1016/j.sbi.2011.03.010. Wilkening J, Desai N, Meyer F, A W: Using clouds for metagenomics - case study. IEEE Cluster. 2009 Ye Y, Choi JH, Tang H: RAPSearch: a fast protein similarity search tool for short reads. BMC Bioinformatics. 2011, 12: 159-10.1186/1471-2105-12-159. Kent WJ: BLAT-the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664. Wang W, Zhang P, Liu X: Short read DNA fragment anchoring algorithm. BMC Bioinformatics. 2009, 10 (Suppl 1): S17-10.1186/1471-2105-10-S1-S17. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, D277-280. 32 Database Muller J, Szklarczyk D, Julien P, Letunic I, Roth A, Kuhn M, Powell S, von Mering C, Doerks T, Jensen LJ, Bork P: eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 2010, D190-195. 38 Database Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2010, D211-222. 38 Database Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O: TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 2007, D260-264. 35 Database Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glockner FO, Hirschman L, Karsch-Mizrachi I, Klenk HP, Knight R, Kottmann R, Kyrpides N, Meyer F, San Gil I, Sansone SA, Schriml LM, Sterk P, Tatusova T, Ussery DW, White O, Wooley J, Yilmaz P, Gilbert JA, Johnston A, Vaughan R, Hunter C, Park J, Morrison N, et al: The Genomic Standards Consortium: Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. PLoS Biol. 2011, 9 (6): e1001088-10.1371/journal.pbio.1001088. Prosser JI: Replicate or lie. Environ Microbiol. 2010, 12 (7): 1806-1810. 10.1111/j.1462-2920.2010.02201.x. Clarke KR: Non-parametric multivariate analyses of changes in community structure. Australian J Ecology. 1993, 117-143. 18 White JR, Nagarajan N, Pop M: Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009, 5 (4): e1000352-10.1371/journal.pcbi.1000352. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC, Knight R, Gordon JI: A core gut microbiome in obese and lean twins. Nature. 2009, 457 (7228): 480-484. 10.1038/nature07540. Kristiansson E, Hugenholtz P, Dalevi D: ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics. 2009, 25 (20): 2737-2738. 10.1093/bioinformatics/btp508. Burke C, Steinberg P, Rusch D, Kjelleberg S, Thomas T: Bacterial community assembly based on functional genes rather than species. Proc Natl Acad Sci USA. 2011, 108 (34): 14288-14293. 10.1073/pnas.1101591108. Mou X, Sun S, Edwards RA, Hodson RE, Moran MA: Bacterial carbon processing by generalist species in the coastal ocean. Nature. 2008, 451 (7179): 708-711. 10.1038/nature06513. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, Vaughan R, Hunter C, Park J, Morrison N, Rocca-Serra P, Sterk P, Arumugam M, Bailey M, Baumgartner L, Birren BW, Blaser MJ, Bonazzi V, Booth T, Bork P, Bushman FD, Buttigieg PL, Chain PS, Charlson E, Costello EK, Huot-Creasy H, et al: Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol. 2011, 29 (5): 415-420. 10.1038/nbt.1823. Hsi-Yang Fritz M, Leinonen R, Cochrane G, Birney E: Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011, 21 (5): 734-740. 10.1101/gr.114819.110.