Comprehensive simulation of metagenomic sequencing data with non‐uniform sampling distribution

Quantitative Biology - Tập 6 Số 2 - Trang 175-185 - 2018
Shansong Liu1, Kui Hua1, Sijie Chen1, Xuegong Zhang1
1MOE Key Lab of Bioinformatics, Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China

Tóm tắt

BackgroundMetagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors.MethodsWe developed a non‐uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings.ResultsWe generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system.ConclusionThe data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios.

Từ khóa


Tài liệu tham khảo

Zhang X., 2016, Reading the underlying information from massive metagenomic sequencing data., Proc. IEEE, 105, 459

10.1016/j.mib.2007.09.001

10.1101/gr.085464.108

10.1373/clinchem.2012.187617

10.1038/nature07540

10.1126/science.1229000

10.1038/srep19233

10.1128/mSystems.00062‐16

Krohn A. Stevens B. Robbins‐Pianka A. Belus M. Allan G.J. Gehring C.(2016)Optimization of 16S amplicon analysis using mock communities: implications for estimating community diversity. PeerJ Preprints

10.1038/nmeth1043

10.1101/gr.186072.114

10.1186/s12859‐015‐0788‐5

10.1038/srep06957

10.1371/journal.pone.0031386

10.1093/femsec/fiw095

10.1038/nmeth.4458

10.1038/nrg.2016.57

10.1371/journal.pone.0003373

10.1371/journal.pone.0075448

10.1186/1471‐2105‐15‐S9‐S14

10.1186/1756‐0500‐7‐533

10.1186/gb‐2011‐12‐2‐r18

10.1093/nar/gkn425

10.1093/nar/gkp1137

10.1038/nature11209

10.1038/nbt.2942

10.1038/nature09944

10.1093/nar/gks1048

10.1093/nar/gkg129

10.1101/gr.8.3.186

10.1186/s12859‐016‐0976‐y

10.1128/aem.60.3.871-879.1994

10.1016/S0958‐1669(02)00315‐4

10.1038/nature06244

10.1111/j.1366‐9516.2004.00082.x

10.1073/pnas.0507245102

10.1111/j.1365‐294X.2010.04948.x

10.1016/j.cels.2016.12.012

10.7717/peerj.425

10.1093/nar/gkv180

10.1101/gr.5969107

10.1038/nmeth.3589

Liu B. Gibbons T. Ghodsi M.andPop M.(2010)MetaPhyler: taxonomic profiling for metagenomic sequences. InBioinformatics and Biomedicine (BIBM) 2010 IEEE International Conference on IEEE pp.95–100

10.1093/bioinformatics/btr266

10.7717/peerj.2584

10.1128/mSystems.00127-16