The Ultrafast and Accurate Mapping Algorithm FANSe3: Mapping a Human Whole-Genome Sequencing Dataset Within 30 Minutes

Gong Zhang1, Yongjian Zhang2, Jingjie Jin1
1MOE Key Laboratory of Tumor Molecular Biology and Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China
2Chi-Biotech Co. Ltd., Shenzhen, 518000, China

Tóm tắt

AbstractAligning billions of reads generated by the next-generation sequencing (NGS) to reference sequences, termed “mapping”, is the time-consuming and computationally-intensive process in most NGS applications. A Fast, accurate and robust mapping algorithm is highly needed. Therefore, we developed the FANSe3 mapping algorithm, which can map a 30 × human whole-genome sequencing (WGS) dataset within 30 min, a 50 × human whole exome sequencing (WES) dataset within 30 s, and a typical mRNA-seq dataset within seconds in a single-server node without the need for any hardware acceleration feature. Like its predecessor FANSe2, the error rate of FANSe3 can be kept as low as 10–9 in most cases, this is more robust than the Burrows–Wheeler transform-based algorithms. Error allowance hardly affected the identification of a driver somatic mutation in clinically relevant WGS data and provided robust gene expression profiles regardless of the parameter settings and sequencer used. The novel algorithm, designed for high-performance cloud-computing after infrastructures, will break the bottleneck of speed and accuracy in NGS data analysis and promote NGS applications in various fields. The FANSe3 algorithm can be downloaded from the website: http://www.chi-biotech.com/fanse3/.

Từ khóa


Tài liệu tham khảo

Cao X, Zhang G (2017) Application of the hyper-accurate mapping algorithm FANSe for next-generation sequencing in non-model organisms. Sci Sin Vitae 47(7):702–707

Chang C et al (2014) Systematic analyses of the transcriptome, translatome, and proteome provide a global view and potential strategy for the C-HPP. J Proteome Res 13(1):38–49

Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21

Fonseca NA et al (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28(24):3169–3177

Homer N, Merriman B, Nelson SF (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS ONE 4(11):e7767

Hung JH, Weng Z (2017) Mapping billions of short reads to a reference genome. Cold Spring Harb Protoc

Kim D et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12(4):357–360

Li S et al (2017) In vitro biomimetic platforms featuring a perfusion system and 3D spheroid culture promote the construction of tissue-engineered corneal endothelial layers. Sci Rep 7(1):777

Liu W et al (2018) TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data. Nucleic Acids Res 46(D1):D206–D212

Mai Z et al (2017) Low-cost, low-bias and low-input RNA-seq with high experimental verifiability based on semiconductor sequencing. Sci Rep 7(1):1053

Nekrutenko A, Taylor J (2012) Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat Rev Genet 13(9):667–672

Nogueira D, Tomas P, Roma N (2016) BowMapCL: Burrows–Wheeler mapping on multiple heterogeneous accelerators. IEEE/ACM Trans Comput Biol Bioinform 13(5):926–938

O’Rawe J et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5(3):28

Park JY et al (2015) Clinical exome performance for reporting secondary genetic findings. Clin Chem 61(1):213–220

Ruffalo M, LaFramboise T, Koyuturk M (2011) Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 27(20):2790–2796

Schbath S et al (2012) Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J Comput Biol 19(6):796–813

Wang K et al (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178

Wang T et al (2013) Translating mRNAs strongly correlate to proteins in a multivariate manner and their translation ratios are phenotype specific. Nucleic Acids Res 41(9):4743–4754

Wu X et al (2014) Iterative genome correction largely improves proteomic analysis of nonmodel organisms. J Proteome Res 13(6):2724–2734

Xiao CL et al (2014) FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications. PLoS ONE 9(4):e94250

Xu S et al (2015) Appraisal of the missing proteins based on the mRNAs bound to ribosomes. J Proteome Res 14(12):4976–4984

Zhang G et al (2012) FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads. Nucleic Acids Res 40(11):e83

Zhang G, Wang T, He Q (2014) How to discover new proteins—translatome profiling. Sci China Life Sci 57(3):358–360

Zhao P et al (2017) Protein-level integration strategy of multiengine MS spectra search results for higher confidence and sequence coverage. J Proteome Res 16(12):4446–4454

Zhong J et al (2014) Resolving chromosome-centric human proteome with translating mRNA analysis: a strategic demonstration. J Proteome Res 13(1):50–59