srnaMapper: an optimal mapping tool for sRNA-Seq reads

BMC Bioinformatics - Tập 23 - Trang 1-19 - 2022
Matthias Zytnicki1, Christine Gaspin1
1Unité de Mathématiques et Informatique Appliquées, INRAE, Castanet-Tolosan, France

Tóm tắt

Sequencing is the key method to study the impact of short RNAs, which include micro RNAs, tRNA-derived RNAs, and piwi-interacting RNA, among others. The first step to make use of these reads is to map them to a genome. Existing mapping tools have been developed for long RNAs in mind, and, so far, no tool has been conceived for short RNAs. However, short RNAs have several distinctive features which make them different from messenger RNAs: they are shorter, they are often redundant, they can be produced by duplicated loci, and they may be edited at their ends. In this work, we present a new tool, srnaMapper, that exhaustively maps these reads with all these features in mind, and is most efficient when applied to reads no longer than 50 base pairs. We show, on several datasets, that srnaMapper is very efficient considering computation time and edition error handling: it retrieves all the hits, with arbitrary number of errors, in time comparable with non-exhaustive tools.

Tài liệu tham khảo

Kim VN, Han J, Siomi MC. Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol. 2009;10:126–39. Axtell MJ. Classification and comparison of small RNAs from plants. Annu Rev Plant Biol. 2013;64:137–59. Bartel DP. Metazoan microRNAs. Cell. 2018;173:20–51. Sobala A, Hutvagner G. Transfer RNA-derived fragments: origins, processing, and functions. WIREs RNA. 2011;2:853–62. Carthew RW, Sontheimer EJ. Origins and mechanisms of miRNAs and siRNAs. Cell. 2009;136:642–55. Ozata DM, Gainetdinov I, Zoch A, O’Carroll D, Zamore PD. PIWI-interacting RNAs: small RNAs with big functions. Nat Rev Genet. 2019;20:89–108. Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:25. Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–60. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2012;29:15–21. Siragusa E, Weese D, Reinert K. Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Res. 2013;41:78–78. Li L, Song Y, Shi X, Liu J, Xiong S, Chen W, Fu Q, Huang Z, Gu N, Zhang R. The landscape of miRNA editing in animals and its impact on miRNA biogenesis and targeting. Genome Res. 2018;28:132–43. Ziemann M, Kaspi A, El-Osta A. Evaluation of microRNA alignment techniques. RNA. 2016;22:1120–38. Godoy PM, Bhakta NR, Barczak AJ, Cakmak H, Fisher S, MacKenzie TC, Patel T, Price RW, Smith JF, Woodruff PG, Erle DJ. Large differences in small RNA composition between human biofluids. Cell Rep. 2018;25:1346–58. Elvira-Matelot E, Hachet M, Shamandi N, Comella P, Sáez-Vásquez J, Zytnicki M, Vaucheret H. Arabidopsis RNASE THREE LIKE2 modulates the expression of protein-coding genes via 24-nucleotide small interfering RNA-directed DNA methylation. Plant Cell. 2016;28:406–25. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. Zytnicki M. mmquant: How to count multi-mapping reads? BMC Bioinform. 2017;18:411. Otto C, Stadler PF, Hoffmann S. Lacking alignments? The next-generation sequencing mapper segemehl revisited. Bioinformatics. 2014;30:1837–43. Zhang H, Chan Y, Fan K, Schmidt B, Liu W. Fast and efficient short read mapping based on a succinct hash index. BMC Bioinform. 2018;19:92. Ahmadi A, Behm A, Honnalli N, Li C, Weng L, Xie X. Hobbes: optimized gram-based methods for efficient read alignment. Nucleic Acids Res. 2011;40:41. Cheng H, Zhang Y, Xu Y. Bitmapper2: A GPU-accelerated all-mapper based on the sparse q-gram index. IEEE/ACM Trans Comput Biol Bioinf. 2019;16:886–97. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E. The arabidopsis information resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 2011;40(D1):1202–10. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431(7011):931–45. Kozomara A, Birgaoanu M, Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 2018;47(D1):155–62.