Reducing bias in RNA sequencing data: a novel approach to compute counts

Francesca Finotello1, Enrico Lavezzo2, Luca Bianco3, Luisa Barzon2, Paolo Emilio Mazzon1, Paolo Fontana3, Stefano Toppo2, Barbara Di Camillo1
1Department of Information Engineering, University of Padova, Padova 35131, Italy
2Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
3Research and Innovation Centre, Edmund Mach Foundation, Trento, 38010, Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol. 2008, 26 (10): 1135-1145. 10.1038/nbt1486.

Mardis ER: The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24 (3): 133-141. 10.1016/j.tig.2007.12.007.

Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320 (5881): 1344-1349. 10.1126/science.1158441.

Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.

Crawford JE, Guelbeogo WM, Sanou A, Traoré A, Vernick KD, Sagnon N, Lazzaro BP: De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology. PLoS one. 2010, 5 (12): e14202-10.1371/journal.pone.0014202.

Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17 (7): 1636-1647. 10.1111/j.1365-294X.2008.03666.x.

Roberts A, Pimentel H, Trapnell C, Pachter L: Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011, 27 (17): 2325-2329. 10.1093/bioinformatics/btr355.

Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011, 7 (1): 522-

Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, Zhu Y, Zhang W, Liang Y, Hu X, Tan X: Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol. 2012, 30 (3): 253-260. 10.1038/nbt.2122.

Bahn JH, Lee J, Li G, Greer C, Peng G, Xiao X: Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 2012, 22 (1): 142-150. 10.1101/gr.124107.111.

Westermann AJ, Gorski SA, Vogel J: Dual RNA-seq of pathogen and host. Nat Rev Microbiol. 2012, 10 (9): 618-630. 10.1038/nrmicro2852.

Shendure J: The beginning of the end for microarrays?. Nat Methods. 2008, 5 (7): 585-587. 10.1038/nmeth0708-585.

Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11 (3): R25-10.1186/gb-2010-11-3-r25.

Finotello F, Lavezzo E, Barzon L, Mazzon P, Fontana P, Toppo S, Di Camillo B: A strategy to reduce technical variability and bias in RNA sequencing data. EMBnet journal. 2012, 18 (B): 65-67. 10.14806/ej.18.B.552. [http://journal.embnet.org/index.php/embnetjournal/article/view/552]

Gao L, Fang Z, Zhang K, Zhi D, Cui X: Length bias correction for RNA-seq data in gene set analyses. Bioinformatics. 2011, 27 (5): 662-669. 10.1093/bioinformatics/btr005.

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.

Young MD, Wakefield MJ, Smyth GK, Oshlack A: Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 2010, 11 (2): R14-10.1186/gb-2010-11-2-r14.

Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol. 2010, 11 (12): 220-10.1186/gb-2010-11-12-220.

Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010, 11: 94-10.1186/1471-2105-11-94.

Benjamini Y, Speed TP: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 2012, 40 (10): e72-10.1093/nar/gks001.

Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008, 36 (16): e105-10.1093/nar/gkn425.

Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038/nature07517.

Finotello F, Lavezzo E, Fontana P, Peruzzo D, Albiero A, Barzon L, Falda M, Di Camillo B, Toppo S: Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. Brief Bioinform. 2012, 13 (3): 269-280. 10.1093/bib/bbr063.

Risso D, Schwartz K, Sherlock G, Dudoit S: GC-Content Normalization for RNA-Seq Data. BMC Bioinformatics. 2011, 12 (1): 480-10.1186/1471-2105-12-480.

Hansen KD, Irizarry RA, Wu Z: Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012, 13 (2): 204-216. 10.1093/biostatistics/kxr054.

Zheng W, Chung L, Zhao H: Bias detection and correction in RNA-Sequencing data. BMC Bioinformatics. 2011, 12 (1): 290-10.1186/1471-2105-12-290.

Hansen KD, Brenner SE, Dudoit S: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 2010, 38 (12): e131-10.1093/nar/gkq224.

Li J, Jiang H, Wong WH: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 2010, 11 (5): R50-10.1186/gb-2010-11-5-r50.

Griebel T, Zacher B, Ribeca P, Raineri E, Lacroix V, Guigó R, Sammeth M: Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 2012, 40 (20): 10073-10083. 10.1093/nar/gks666.

Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ: Alternative expression analysis by RNA sequencing. Nat Methods. 2010, 7 (10): 843-847. 10.1038/nmeth.1503.

Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B: Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011, 21 (9): 1543-1551. 10.1101/gr.121095.111.

Van Verk MC, Hickman R, Pieterse CM, Van Wees S: RNA-Seq: revelation of the messengers. Trends Plant Sci. 2013, 18 (4): 175-179. 10.1016/j.tplants.2013.02.001.

Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.

Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26 (6): 841-842. 10.1093/bioinformatics/btq033.

Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464 (7289): 768-772. 10.1038/nature08872.

Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A: Differential expression in RNA-seq: a matter of depth. Genome Res. 2011, 21 (12): 2213-2223. 10.1101/gr.124321.111.

Di Camillo B, Toffolo G, Nair SK, Greenlund LJ, Cobelli C: Significance analysis of microarray transcript levels in time series experiments. BMC Bioinformatics. 2007, 8 (Suppl 1): S10-10.1186/1471-2105-8-S1-S10.