Reporting FDR analogous confidence intervals for the log fold change of differentially expressed genes

BMC Bioinformatics - Tập 12 - Trang 1-9 - 2011
Klaus Jung1, Tim Friede1, Tim Beißbarth1
1Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany

Tóm tắt

Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of tissue samples to detect those genes which are differentially expressed one statistical test per gene is performed, and resulting p-values are adjusted to control the false discovery rate. In addition, the expression change of each gene is quantified by some effect measure, typically the log fold change. In certain cases, however, a gene with a significant p-value can have a rather small fold change while in other cases a non-significant gene can have a rather large fold change. The biological relevance of the change of gene expression can be more intuitively judged by a fold change then merely by a p-value. Therefore, confidence intervals for the log fold change which accompany the adjusted p-values are desirable. In a new approach, we employ an existing algorithm for adjusting confidence intervals in the case of high-dimensional data and apply it to a widely used linear model for microarray data. Furthermore, we adopt a concept of different relevance categories for effects in clinical trials to assess biological relevance of genes in microarray experiments. In a brief simulation study the properties of the adjusting algorithm are maintained when being combined with the linear model for microarray data. In two cancer data sets the adjusted confidence intervals can indicate significance of large fold changes and distinguish them from other large but non-significant fold changes. Adjusting of confidence intervals also corrects the assessment of biological relevance. Our new combination approach and the categorization of fold changes facilitates the selection of genes in microarray experiments and helps to interpret their biological relevance.

Tài liệu tham khảo

Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Stat Soc B 1995, 75: 289–300. Gaedcke J, Grade M, Jung K, Camps J, Jo P, Emons G, Gehoff A, Sax U, Schirmer M, Becker H, Beissbarth T, Ried T, Ghadimi BM: Mutated KRAS Results in Overexpression of DUSP4, a MAPKinase Phosphatase, and SMYD3, a Histone Methyltransferase, in Rectal Carcinomas. Genes Chromosomes Cancer 2010, 49: 1024–1034. 10.1002/gcc.20811 Benjamini Y, Yekutieli D: False Discovery Rate-Adjusted Multiple Confidence Intervals for Selected Parameters. J Am Stat Assoc 2005, 100: 71–81. 10.1198/016214504000001907 Jung K, Poschmann G, Podwojski K, Eisenacher M, Kohl M, Pfeiffer K, Meyer HE, Stühler K, Stephan C: Adjusted Confidence Intervals for the Expression Change of Proteins Observed in 2-Dimensional Difference Gel Electrophoresis. J Proteomics Bioinform 2009, 2: 78–87. 10.4172/jpb.1000064 Smyth G: Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Stat Appl Genet Mol Biol 2004, 3: Article 3. Lönnsted I, Speed TP: Replicated Microarray Data. Stat Sinica 2002, 12: 31–46. Efron B: Prediction and Effect Size Estimation. In Large-Scale Inference. New York: Cambridge University Press; 2010:211–241. Ghosh D: Empirical Bayes Method for Estimation and Confidence Intervals in High Dimensional Problems. Statistica Sinica 2009, 19: 125–143. Jones PW: Interpreting thresholds for a clinically significant change in health status in asthma and COPD. Eur Respir J 2002, 19: 396–404. R Development Core Team:R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2010. [http://www.R-project.org] Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Ann Stat 2001, 29: 1165–1188. 10.1214/aos/1013699998 Dudoit S, Shaffer JP, Blodrick JC: Multiple Hypothesis Testing in Microarray Experiments. Stat Sci 2003, 18: 71–103. 10.1214/ss/1056397487 Kieser M, Hauschke D: Assessment of clinical relevance by considering point estimates and associated confidence intervals. Pharm Stat 2005, 4: 101–107. 10.1002/pst.161 Victor N: On Clinically Relevant Differences and Shifted Nullhypotheses. Method Inform Med 1987, 26::109–116. Alexa A, Rahnenfuehrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 2006, 22: 1600–1607. 10.1093/bioinformatics/btl140 Lips EH, van Eijk R, de Graaf EJR, Oosting J, de Miranda NFCC, van de Velde CJ, Eilers PHC, Tollenaar RAEM, van Wezel T, Morreau H: Integrating chromosomal aberrations and gene expression profiles to dissect rectal tumorigenesis. BMC Cancer 2008, 28: 314. Beer DG, Kardia SLR, Huang CH, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Haysaka S, Taylor JMG, Iannettoni MD, Orringer MB: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 2002, 8: 816–824. Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 2003, 4: 210. 10.1186/gb-2003-4-4-210 Lewin A, Richardson S, Marshall C, Glazier A, Aitman T: Bayesian Modeling of Differential Gene Expression. Biometrics 2006, 62: 1–9. Bochkina N, Richardson S: Tail Posterior Probability for Inference in Pairwise and Multiclass Gene Expression Data. Biometrics 2007, 63: 1117–1125. 10.1111/j.1541-0420.2007.00807.x van de Wiel MA, Kyung KI: Estimating the False Discovery Rate Using Nonparametric Deconvolution. Biometrics 2007, 63: 806–815. 10.1111/j.1541-0420.2006.00736.x McCarthy DJ, Smyth G: Testing significance relative to a fold-change threshold. Bioinformatics 2009, 25::765–771. 10.1093/bioinformatics/btp053