Metagenomic biomarker discovery and explanation

Nicola Segata1, Jacques Izard2,3, Levi Waldron1, Dirk Gevers4, Larisa Miropolsky1, Wendy S Garrett5,6,7, Curtis Huttenhower1
1Department of Biostatistics, Harvard School of Public Health, Boston, USA
2Department of Oral Medicine, Infection, and Immunity, Harvard School of Dental Medicine, Boston, USA
3Department of Molecular Genetics, The Forsyth Institute, Cambridge, USA
4Microbial Sequencing Center, The Broad Institute of MIT and Harvard, Cambridge, USA
5Department of Medical Oncology, Dana-Farber Cancer Institute, USA
6Department of Medicine, Harvard Medical School, Boston, USA
7Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, USA

Tóm tắt

This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/ .

Từ khóa


Tài liệu tham khảo

Golub TR: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531.

Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA: Use of proteomic patterns in serum to identify ovarian cancer GLOSSARY. Lancet. 2002, 359: 572-577. 10.1016/S0140-6736(02)07746-2.

Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I, Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD: Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res. 2008, 14: 5198-5208. 10.1158/1078-0432.CCR-08-0196.

Wei X, Li K-C: Exploring the within- and between-class correlation distributions for tumor classification. Proc Natl Acad Sci USA. 2010, 107: 6737-6742. 10.1073/pnas.0910140107.

De Filippo C, Cavalieri D, Di Paola M, Ramazzotti M, Poullet JB, Massart S, Collini S, Pieraccini G, Lionetti P: Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc Natl Acad Sci USA. 2010, 107: 14691-14696. 10.1073/pnas.1005963107.

Turnbaugh PJ, Bäckhed F, Fulton L, Gordon JI: Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe. 2008, 3: 213-223. 10.1016/j.chom.2008.02.015.

Ley RE, Peterson Da, Gordon JI: Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell. 2006, 124: 837-848. 10.1016/j.cell.2006.02.017.

Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P, Roca J, Dore J: Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut. 2006, 55: 205-211. 10.1136/gut.2005.073817.

Sokol H, Seksik P, Furet JP, Firmesse O, Nion-Larmurier I, Beaugerie L, Cosnes J, Corthier G, Marteau P, Doré J: Low counts of Faecalibacterium prausnitzii in colitis microbiota. Inflamm Bowel Dis. 2009, 15: 1183-1189. 10.1002/ibd.20903.

Ordovas JM, Mooser V: Metagenomics: the role of the microbiome in cardiovascular diseases. Curr Opin Lipidol. 2006, 17: 157-161. 10.1097/01.mol.0000217897.75068.ba.

Zhang L, Henson BS, Camargo PM, Wong DT: The clinical value of salivary biomarkers for periodontal disease. Periodontology 2000. 2009, 51: 25-37. 10.1111/j.1600-0757.2009.00315.x.

Zhang L, Farrell JJ, Zhou H, Elashoff D, Akin D, Park NH, Chia D, Wong DT: Salivary transcriptomic biomarkers for detection of resectable pancreatic cancer. Gastroenterology. 2010, 138: 949-957. 10.1053/j.gastro.2009.11.010. e947

NIH HMP Working Group, Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, Bonazzi V, McEwen JE, Wetterstrand KA, Deal C, Baker CC, Di Francesco V, Howcroft TK, Karp RW, Lunsford RD, Wellington CR, Belachew T, Wright M, Giblin C, David H, Mills M, Salomon R, Mullins C, Akolkar B, Begg L, Davis C, Grandison L, Humble M, Khalsa J, et al: The NIH Human Microbiome Project. Genome Res. 2009, 19: 2317-2323.

Hamady M, Fraser-Liggett CM, Turnbaugh PJ, Ley RE, Knight R, Gordon JI: The Human Microbiome Project. Nature. 2007, 449: 804-810. 10.1038/nature06244.

Magrini V, Turnbaugh PJ, Ley RE, Mardis ER, Mahowald MA, Gordon JI: An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006, 444: 1027-1131. 10.1038/nature05414.

Duncan SH, Lobley GE, Holtrop G, Ince J, Johnstone aM, Louis P, Flint HJ: Human colonic microbiota associated with diet, obesity and weight loss. Int J Obesity (Lond). 2008, 32: 1720-1724. 10.1038/ijo.2008.155.

Turnbaugh PJ, Ridaura VK, Faith JJ, Rey FE, Knight R, Gordon JI: The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. Sci Transl Med. 2009, 1: 6ra14-10.1126/scitranslmed.3000322.

Gao Z, Tseng C-h, Strober BE, Pei Z, Blaser MJ: Substantial alterations of the cutaneous bacterial biota in psoriatic lesions. PloS One. 2008, 3: e2719-10.1371/journal.pone.0002719.

Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM: Comparative metagenomics of microbial communities. Science. 2005, 308: 554-557. 10.1126/science.1107851.

Solovyev VV, Allen EE, Ram RJ, Rokhsar DS, Chapman J, Richardson PM, Tyson GW, Rubin EM, Banfield JF, Hugenholtz P: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004, 428: 37-43. 10.1038/nature02340.

Lecuit M, Lortholary O: Immunoproliferative small intestinal disease associated with Campylobacter jejuni. Med Mal Infect. 2005, 35 (Suppl 2): S56-58.

Relman DA, Schmidt TM, MacDermott RP, Falkow S: Identification of the uncultured bacillus of Whipple's disease. N Engl J Med. 1992, 327: 293-301. 10.1056/NEJM199207303270501.

Oakley BB, Fiedler TL, Marrazzo JM, Fredricks DN: Diversity of human vaginal bacterial communities and associations with clinically defined bacterial vaginosis. Appl Environ Microbiol. 2008, 74: 4898-4909. 10.1128/AEM.02884-07.

Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.

Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article3-

Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan Ea, Wang Y: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer. 2008, 8: 37-49. 10.1038/nrc2294.

Swan Ka, Curtis DE, McKusick KB, Voinov AV, Mapa Fa, Cancilla MR: High-throughput gene mapping in Caenorhabditis elegans. Genome Res. 2002, 12: 1100-1105.

Wooley JC, Ye Y: Metagenomics: facts and artifacts, and computational challenges*. J Comput Sci Technol. 2009, 25: 71-81.

Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC, Knight R, Gordon JI: A core gut microbiome in obese and lean twins. Nature. 2009, 457: 480-484. 10.1038/nature07540.

Pedrós-Alió C: Marine microbial diversity: can it be determined?. Trends Microbiol. 2006, 14: 257-263. 10.1016/j.tim.2006.04.007.

Sogin ML, Morrison HG, Huber Ja, Welch D, Huse SM, Neal PR, Arrieta JM, Herndl GJ: Microbial diversity in the deep sea and the underexplored "rare biosphere". Proc Natl Acad Sci USA. 2006, 103: 12115-12120. 10.1073/pnas.0605127103.

Gobet A, Quince C, Ramette A: Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets. Nucleic Acids Res. 2010, 38: e155-10.1093/nar/gkq545.

Dethlefsen L, McFall-Ngai M, Relman DA: An ecological and evolutionary perspective on human-microbe mutualism and disease. Nature. 2007, 449: 811-818. 10.1038/nature06245.

Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res. 2007, 17: 377-386. 10.1101/gr.5969107.

Mitra S, Gilbert JA, Field D, Huson DH: Comparison of multiple metagenomes using phylogenetic networks based on ecological indices. ISME J. 2010, 4: 1236-1242. 10.1038/ismej.2010.51.

Mitra S, Klar B, Huson DH: Visual and statistical comparison of metagenomes. Bioinformatics. 2009, 25: 1849-1855. 10.1093/bioinformatics/btp341.

Parks DH, Beiko RG: Identifying biologically relevant differences between metagenomic communities. Bioinformatics. 2010, 26: 715-721. 10.1093/bioinformatics/btq041.

Lozupone C, Knight R: UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005, 71: 8228-8235. 10.1128/AEM.71.12.8228-8235.2005.

Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008, 9: 386-10.1186/1471-2105-9-386.

Kristiansson E, Hugenholtz P, Dalevi D: ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics. 2009, 25: 2737-2738. 10.1093/bioinformatics/btp508.

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009, 75: 7537-7541. 10.1128/AEM.01541-09.

Goll J, Rusch D, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S: METAREP: JCVI Metagenomics Reports - an open source tool for high-performance comparative metagenomics. Bioinformatics. 2010, 26: 2631-2632. 10.1093/bioinformatics/btq455.

Jolliffe IT: Principal Component Analysis. 1986, New York: Springer-Verlag

Gower JC: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966, 53: 325-338.

White JR, Nagarajan N, Pop M: Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009, 5: e1000352-10.1371/journal.pcbi.1000352.

Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.

Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. 2010, Chapter 19: Unit 19.10.1-21-

LEfSe. [http://huttenhower.sph.harvard.edu/lefse/]

Kruskal WH, Wallis WA: Use of ranks in one-criterion variance analysis. J Am Stat Assoc. 1952, 47: 583-621. 10.2307/2280779.

Wilcoxon F: Individual comparisons by ranking methods. Biometrics. 1945, 1: 80-83. 10.2307/3001968.

Mann HB, Whitney DR: On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947, 18: 50-60. 10.1214/aoms/1177730491.

Fisher RA: The use of multiple measurements in taxonomic problems. Ann Eugenics. 1936, 7: 179-188. 10.1111/j.1469-1809.1936.tb02137.x.

Dal Bello F, Hertel C: Oral cavity as natural reservoir for intestinal lactobacilli. Syst Appl Microbiol. 2006, 29: 69-76. 10.1016/j.syapm.2005.07.002.

Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R: Bacterial community variation in human body habitats across space and time. Science. 2009, 326: 1694-1697. 10.1126/science.1177486.

Human Microbiome Project clinical sampling protocol. [http://hmpdacc.org/micro_analysis/microbiome_sampling.php]

Turner JR: Intestinal mucosal barrier function in health and disease. Nat Rev Immunol. 2009, 9: 799-809. 10.1038/nri2653.

Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM: The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009, 37: D141-145. 10.1093/nar/gkn879.

Hilbert F, Scherwitzel M, Paulsen P, Szostak MP: Survival of Campylobacter jejuni under conditions of atmospheric oxygen tension with the support of Pseudomonas spp. Appl Environ Microbiol. 2010, 76: 5911-5917. 10.1128/AEM.01532-10.

Godon J-J, Morinière J, Moletta M, Gaillac M, Bru V, Delgènes J-P: Rarity associated with specific ecological niches in the bacterial world: the 'Synergistes' example. Environ Microbiol. 2005, 7: 213-224. 10.1111/j.1462-2920.2004.00693.x.

Shah Sa, Simpson SJ, Brown LF, Comiskey M, de Jong YP, Allen D, Terhorst C: Development of colonic adenocarcinomas in a mouse model of ulcerative colitis. Inflamm Bowel Dis. 1998, 4: 196-202.

Pizarro T: Mouse models for the study of Crohn's disease. Trends Mol Med. 2003, 9: 218-222. 10.1016/S1471-4914(03)00052-2.

Panwala CM, Jones JC, Viney JL: A novel model of inflammatory bowel disease: mice deficient for the multiple drug resistance gene, mdr1a, spontaneously develop colitis. J Immunol. 1998, 161: 5733-5744.

Wirtz S, Neurath MF: Mouse models of inflammatory bowel disease. Adv Drug Delivery Rev. 2007, 59: 1073-1083. 10.1016/j.addr.2007.07.003.

Sartor RB: Mechanisms of disease: pathogenesis of Crohn's disease and ulcerative colitis. Nat Clin Pract Gastroenterol Hepatol. 2006, 3: 390-407. 10.1038/ncpgasthep0528.

Garrett WS, Lord GM, Punit S, Lugo-Villarino G, Mazmanian SK, Ito S, Glickman JN, Glimcher LH: Communicable ulcerative colitis induced by T-bet deficiency in the innate immune system. Cell. 2007, 131: 33-45. 10.1016/j.cell.2007.08.017.

Garrett WS, Gallini CA, Yatsunenko T, Michaud M, DuBois A, Delaney ML, Punit S, Karlsson M, Bry L, Glickman JN, Gordon JI, Onderdonk AB, Glimcher LH: Enterobacteriaceae act in concert with the gut microbiota to induce spontaneous and maternally transmitted colitis. Cell Host Microbe. 2010, 8: 292-300. 10.1016/j.chom.2010.08.004.

Veiga P, Gallini CA, Beal C, Michaud M, Delaney ML, DuBois A, Khlebnikov A, van Hylckama Vlieg JE, Punit S, Glickman JN, Onderdonk A, Glimcher LH, Garrett WS: Bifidobacterium animalis subsp. lactis fermented milk product reduces inflammation by altering a niche for colitogenic microbes. Proc Natl Acad Sci USA. 2010, 107: 18132-18137. 10.1073/pnas.1011737107.

Masaaki O, Yoshimi B, Kai-P L, Nobuko M: Metascardovia criceti Gen. Nov., Sp. Nov., from hamster dental plaque. Microbiol Immunol. 2007, 51: 747-754.

Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, Furlan M, Desnues C, Haynes M, Li L, McDaniel L, Moran MA, Nelson KE, Nilsson C, Olson R, Paul J, Brito BR, Ruan Y, Swan BK, Stevens R, Valentine DL, Thurber RV, Wegley L, White BA, Rohwer F: Functional metagenomic profiling of nine biomes. Nature. 2008, 452: 629-632. 10.1038/nature06810.

Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, et al: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005, 33: 5691-5702. 10.1093/nar/gki866.

Greene JM, Collins F, Lefkowitz EJ, Roos D, Scheuermann RH, Sobral B, Stevens R, White O, Di Francesco V: National Institute of Allergy and Infectious Diseases bioinformatics resource centers: new assets for pathogen informatics. Infect Immun. 2007, 75: 3212-3219. 10.1128/IAI.00105-07.

Krebs CJ: Ecology: The Experimental Analysis of Distribution and Abundance. 2008, Benjamin Cummings

Kurokawa K, Itoh T, Kuwahara T, Oshima K, Toh H, Toyoda A, Takami H, Morita H, Sharma VK, Srivastava TP, Taylor TD, Noguchi H, Mori H, Ogura Y, Ehrlich DS, Itoh K, Takagi T, Sakaki Y, Hayashi T, Hattori M: Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007, 14: 169-181. 10.1093/dnares/dsm018.

Tatusov RL: A genomic perspective on protein families. Science. 1997, 278: 631-637. 10.1126/science.278.5338.631.

Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29: 22-28. 10.1093/nar/29.1.22.

Turroni F, Foroni E, Pizzetti P, Giubellini V, Ribbera A, Merusi P, Cagnasso P, Bizzarri B, de'Angelis GL, Shanahan F, van Sinderen D, Ventura M: Exploring the diversity of the bifidobacterial population in the human intestinal tract. Appl Environ Microbiol. 2009, 75: 1534-1545. 10.1128/AEM.02216-08.

Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A: False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics. 2005, 21: 3017-3024. 10.1093/bioinformatics/bti448.

Suzuki Y, Nei M: False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol Biol Evol. 2004, 21: 914-921. 10.1093/molbev/msh098.

Boulesteix A-L: Over-optimism in bioinformatics research. Bioinformatics. 2010, 26: 437-439. 10.1093/bioinformatics/btp648.

2020 visions. Nature. 2010, 463: 26-32.

Hamady M, Knight R: Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res. 2009, 19: 1141-1152. 10.1101/gr.085464.108.

Wooley JC, Godzik A, Friedberg I: A primer on metagenomics. PLoS Comput Biol. 2010, 6: e1000667-10.1371/journal.pcbi.1000667.

Ritchie MD: Using prior knowledge and genome-wide association to identify pathways involved in multiple sclerosis. Genome Med. 2009, 1: 65-10.1186/gm65.

Tintle N, Lantieri F, Lebrec J, Sohns M, Ballard D, Bickeböller H: Inclusion of a priori information in genome-wide association analysis. Genet Epidemiol. 2009, 33 (Suppl 1): S74-80.

Lin W-Y, Lee W-C: Incorporating prior knowledge to facilitate discoveries in a genome-wide association study on age-related macular degeneration. BMC Res Notes. 2010, 3: 26-10.1186/1756-0500-3-26.

Reeder J, Knight R: The 'rare biosphere': a reality check. Nat Methods. 2009, 6: 636-637. 10.1038/nmeth0909-636.

Taylor MW, Schupp PJ, Dahllof I, Kjelleberg S, Steinberg PD: Host specificity in marine sponge-associated bacteria, and potential implications for marine microbial diversity. Environ Microbiol. 2003, 6: 121-130. 10.1046/j.1462-2920.2003.00545.x.

Tamames J, Abellán JJ, Pignatelli M, Camacho A, Moya A: Environmental distribution of prokaryotic taxa. BMC Microbiol. 2010, 10: 85-10.1186/1471-2180-10-85.

Kassen R: The experimental evolution of specialists, generalists, and the maintenance of diversity. J Evol Biol. 2002, 15: 173-190. 10.1046/j.1420-9101.2002.00377.x.

Frank DN, Pace NR, Peterson DA, Gordon JI: Metagenomic approaches for defining the pathogenesis of inflammatory bowel diseases. Cell Host Microbe. 2008, 3: 417-427. 10.1016/j.chom.2008.05.001.

Young C, Sharma R, Handfield M, Mai V, Neu J: Biomarkers for infants at risk for necrotizing enterocolitis: clues to prevention?. Pediatric Res. 2009, 65: 91R-97R. 10.1203/PDR.0b013e31819dba7d.

Asikainen S, Doğan B, Turgut Z, Paster BJ, Bodur A, Oscarsson J: Specified species in gingival crevicular fluid predict bacterial diversity. PLoS ONE. 2010, 5: e13589-10.1371/journal.pone.0013589.

Wong D, Zhang L, Farrell J, Zhou H, Elashoff D, Gao K, Paster B: Salivary biomarkers for pancreatic cancer detection. J Clin Oncol. 2009, 27: 4630-

Culligan EP, Hill C, Sleator RD: Probiotics and gastrointestinal disease: successes, problems and future prospects. Gut Pathog. 2009, 1: 19-10.1186/1757-4749-1-19.

Preidis GA, Versalovic J: Targeting the human microbiome with antibiotics, probiotics, and prebiotics: gastroenterology enters the metagenomics era. Gastroenterology. 2009, 136: 2015-2031. 10.1053/j.gastro.2009.01.072.

Borody TJ, Warren EF, Leis S, Surace R, Ashman O: Treatment of ulcerative colitis using fecal bacteriotherapy. J Clin Gastroenterol. 2003, 37: 42-47. 10.1097/00004836-200307000-00012.

Khoruts A, Dicksved J, Jansson JK, Sadowsky MJ: Changes in the composition of the human fecal microbiome after bacteriotherapy for recurrent Clostridium difficile-associated diarrhea. J Clin Gastroenterol. 2010, 44: 354-360.

Manichanh C, Reeder J, Gibert P, Varela E, Llopis M, Antolin M, Guigo R, Knight R, Guarner F: Reshaping the gut microbiome with bacterial transplantation and antibiotic intake. Genome Res. 2010, 20: 1411-1419. 10.1101/gr.107987.110.

You D, Franzos MA: Successful treatment of fulminant Clostridium difficile infection with fecal bacteriotherapy. Ann Intern Med. 2008, 148: 632-633.

Chang Y-w, Lin C-j: Feature ranking using linear SVM. J Machine Learning Res. 2008, 3: 53-64.

Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007, 73: 5261-5267. 10.1128/AEM.00062-07.

Bell TC, Cleary JG, Witten IH: Text Compression. 1990, Prentice-Hall, Inc

HMP Data Analysis and Coordination Center. [http://www.hmpdacc.org/tools_protocols/tools_protocols.php]

Mo Bio PowerSoil kit. [http://www.mobio.com/]

Huse SM, Huber Ja, Morrison HG, Sogin ML, Welch DM: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 2007, 8: R143-10.1186/gb-2007-8-7-r143.

Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007, 35: 7188-7196. 10.1093/nar/gkm864.

Schloss PD: A high-throughput DNA sequence aligner for microbial ecology studies. PloS ONE. 2009, 4: e8230-10.1371/journal.pone.0008230.

Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Ciulla D, Tabbaa D, Highlander SK, Sodergren E, Methé B, DeSantis TZ, Human Microbiome Consortium, Petrosino JF, Knight R, Birren BW: Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 2011, 21: 494-504. 10.1101/gr.112730.110.

Garrity GM, Lilburn TG, Cole JR, Harrison SH, Euzeby J, Tindall BJ: Taxonomic Outline of the Bacteria and Archaea. 2007, [http://www.taxonomicoutline.org/index.php/toba/article/viewFile/190/223]

Sequence Read Archive: SRP002012 Human Microbiome Project 454 Clinical Production Pilot (PPS). [http://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP002012#]

Hothorn TH, Hornik K, van De Wiel MA, Zeileis A: Implementing a class of permutation tests: the coin package. J Stat Software. 2008, 28: 1-23.

Venables WN, Ripley BD: Modern Applied Statistics with S. 2002, Springer, 4

rpy2. [http://rpy.sourceforge.net/rpy2.html]

Hunter JD: Matplotlib: a 2D graphics environment. Computing Sci Eng. 2007, 9: 90-95.