A reanalysis of mouse ENCODE comparative gene expression data

F1000Research - Tập 4 - Trang 121
Yoav Gilad1, Orna Mizrahi-Man1
1Department of Human Genetics, University of Chicago, Chicago, IL 60637

Tóm tắt

Recently, the Mouse ENCODE Consortium reported that comparative gene expression data from human and mouse tend to cluster more by species rather than by tissue. This observation was surprising, as it contradicted much of the comparative gene regulatory data collected previously, as well as the common notion that major developmental pathways are highly conserved across a wide range of species, in particular across mammals. Here we show that the Mouse ENCODE gene expression data were collected using a flawed study design, which confounded sequencing batch (namely, the assignment of samples to sequencing flowcells and lanes) with species. When we account for the batch effect, the corrected comparative gene expression data from human and mouse tend to cluster by tissue, not by species.

Từ khóa


Tài liệu tham khảo

F Yue, 2014, A comparative encyclopedia of DNA elements in the mouse genome., Nature., 515, 355-364, 10.1038/nature13992

S Lin, 2014, Comparison of the transcriptional landscapes between human and mouse tissues., Proc Natl Acad Sci U S A., 111, 17224-17229, 10.1073/pnas.1413624111

F Cunningham, 2015, Ensembl 2015., Nucleic Acids Res., 43, D662-669, 10.1093/nar/gku1010

J Harrow, 2012, GENCODE: the reference human genome annotation for The ENCODE Project., Genome Res., 22, 1760-1774, 10.1101/gr.135350.111

Y Wu, 2014, Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm., bioRxiv., 10.1101/005736

W Kent, 2002, BLAT--the BLAST-like alignment tool., Genome Res., 12, 656-664, 10.1101/gr.229202

D Kim, 2013, TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions., Genome Biol., 14, R36, 10.1186/gb-2013-14-4-r36

C Trapnell, 2009, TopHat: discovering splice junctions with RNA-Seq., Bioinformatics., 25, 1105-1111, 10.1093/bioinformatics/btp120

A Quinlan, 2010, BEDTools: a flexible suite of utilities for comparing genomic features., Bioinformatics., 26, 841-842, 10.1093/bioinformatics/btq033

C Trapnell, 2012, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks., Nat Protoc., 7, 562-578, 10.1038/nprot.2012.016

Y Liao, 2014, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features., Bioinformatics., 30, 923-930, 10.1093/bioinformatics/btt656

R Team, 2015, R: A Language and Environment for Statistical Computing.

H Wickham, 2009, ggplot2: elegant graphics for data analysis., 10.1007/978-0-387-98141-3

R Kolde, 2015, pheatmap: Pretty Heatmaps.

M Dillies, 2013, A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis., Brief Bioinform., 14, 671-683, 10.1093/bib/bbs046

S Li, 2014, Detecting and correcting systematic variation in large-scale RNA sequencing data., Nat Biotechnol., 32, 888-895, 10.1038/nbt.3000

D Risso, 2011, GC-content normalization for RNA-Seq data., BMC Bioinformatics., 12, 480, 10.1186/1471-2105-12-480

M Robinson, 2010, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data., Bioinformatics., 26, 139-140, 10.1093/bioinformatics/btp616

M Robinson, 2010, A scaling normalization method for differential expression analysis of RNA-seq data., Genome Biol., 11, R25, 10.1186/gb-2010-11-3-r25

J Leek, 2012, The sva package for removing batch effects and other unwanted variation in high-throughput experiments., Bioinformatics., 28, 882-883, 10.1093/bioinformatics/bts034

E Chan, 2009, Conservation of core gene expression in vertebrate tissues., J Biol., 8, 33, 10.1186/jbiol130

J Akey, 2007, On the design and analysis of gene expression studies in human populations., Nat Genet., 39, 807-808, 10.1038/ng0707-807