The Ultrafast and Accurate Mapping Algorithm FANSe3: Mapping a Human Whole-Genome Sequencing Dataset Within 30 Minutes

Springer Science and Business Media LLC - 2021

Gong Zhang¹, Yongjian Zhang², Jingjie Jin¹

¹MOE Key Laboratory of Tumor Molecular Biology and Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou 510632, China

²Chi-Biotech Co. Ltd., Shenzhen, 518000, China

Tóm tắt

AbstractAligning billions of reads generated by the next-generation sequencing (NGS) to reference sequences, termed “mapping”, is the time-consuming and computationally-intensive process in most NGS applications. A Fast, accurate and robust mapping algorithm is highly needed. Therefore, we developed the FANSe3 mapping algorithm, which can map a 30 × human whole-genome sequencing (WGS) dataset within 30 min, a 50 × human whole exome sequencing (WES) dataset within 30 s, and a typical mRNA-seq dataset within seconds in a single-server node without the need for any hardware acceleration feature. Like its predecessor FANSe2, the error rate of FANSe3 can be kept as low as 10–9 in most cases, this is more robust than the Burrows–Wheeler transform-based algorithms. Error allowance hardly affected the identification of a driver somatic mutation in clinically relevant WGS data and provided robust gene expression profiles regardless of the parameter settings and sequencer used. The novel algorithm, designed for high-performance cloud-computing after infrastructures, will break the bottleneck of speed and accuracy in NGS data analysis and promote NGS applications in various fields. The FANSe3 algorithm can be downloaded from the website: http://www.chi-biotech.com/fanse3/.

Từ khóa

Tài liệu tham khảo

Cao X, Zhang G (2017) Application of the hyper-accurate mapping algorithm FANSe for next-generation sequencing in non-model organisms. Sci Sin Vitae 47(7):702–707

Chang C et al (2014) Systematic analyses of the transcriptome, translatome, and proteome provide a global view and potential strategy for the C-HPP. J Proteome Res 13(1):38–49

Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21

Fonseca NA et al (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28(24):3169–3177

Homer N, Merriman B, Nelson SF (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS ONE 4(11):e7767

Hung JH, Weng Z (2017) Mapping billions of short reads to a reference genome. Cold Spring Harb Protoc

Kim D et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12(4):357–360

Li S et al (2017) In vitro biomimetic platforms featuring a perfusion system and 3D spheroid culture promote the construction of tissue-engineered corneal endothelial layers. Sci Rep 7(1):777

Liu W et al (2018) TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data. Nucleic Acids Res 46(D1):D206–D212

Mai Z et al (2017) Low-cost, low-bias and low-input RNA-seq with high experimental verifiability based on semiconductor sequencing. Sci Rep 7(1):1053

Nekrutenko A, Taylor J (2012) Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat Rev Genet 13(9):667–672

Nogueira D, Tomas P, Roma N (2016) BowMapCL: Burrows–Wheeler mapping on multiple heterogeneous accelerators. IEEE/ACM Trans Comput Biol Bioinform 13(5):926–938

O’Rawe J et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5(3):28

Park JY et al (2015) Clinical exome performance for reporting secondary genetic findings. Clin Chem 61(1):213–220

Ruffalo M, LaFramboise T, Koyuturk M (2011) Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 27(20):2790–2796

Schbath S et al (2012) Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis. J Comput Biol 19(6):796–813

Wang K et al (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178

Wang T et al (2013) Translating mRNAs strongly correlate to proteins in a multivariate manner and their translation ratios are phenotype specific. Nucleic Acids Res 41(9):4743–4754

Wu X et al (2014) Iterative genome correction largely improves proteomic analysis of nonmodel organisms. J Proteome Res 13(6):2724–2734

Xiao CL et al (2014) FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications. PLoS ONE 9(4):e94250

Xu S et al (2015) Appraisal of the missing proteins based on the mRNAs bound to ribosomes. J Proteome Res 14(12):4976–4984

Zhang G et al (2012) FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads. Nucleic Acids Res 40(11):e83

Zhang G, Wang T, He Q (2014) How to discover new proteins—translatome profiling. Sci China Life Sci 57(3):358–360

Zhao P et al (2017) Protein-level integration strategy of multiengine MS spectra search results for higher confidence and sequence coverage. J Proteome Res 16(12):4446–4454

Zhong J et al (2014) Resolving chromosome-centric human proteome with translating mRNA analysis: a strategic demonstration. J Proteome Res 13(1):50–59

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]