Distributional fold change test – a statistical approach for detecting differential expression in microarray experiments
Tóm tắt
Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set.
Tài liệu tham khảo
Göhlmann H, Talloen W: Gene Expression Studies Using Affymetrix Microarrays. Boca Raton: CRC Press, 2009.
Zhang A: Advanced analysis of gene expression data. Singapore: World Scientific, 2006.
Kim SY, Lee JW, Sohn IS: Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Stat Methods Med Research. 2006, 15: 3-20. 10.1191/0962280206sm423oa
Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3 (1): Article 3.
Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98 (9): 5116-5121. 10.1073/pnas.091062498
Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M: Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics. 2006, 7: 538. 10.1186/1471-2105-7-538
Opgen-Rhein R, Strimmer K: Accurate ranking of differentially expressed genes by a distribution- free shrinkage approach. Statist Appl Genet Mol Biol. 2007, 6: 9.
Hu J, Xu J: Density based pruning for identification of differentially expressed genes from microarray data. BMC Genomics. 2010, 11 (Suppl 2): S3. 10.1186/1471-2164-11-S2-S3
Kadota K, Nakai Y, Shimizu K: A weighted average difference method for detecting differentially expressed genes from microarray data. Algorithm Mol Biol. 2008, 3: 8-10.1186/1748-7188-3-8. 10.1186/1748-7188-3-8
Farragher SM, Tanney A, Kennedy RD, Harkin PD: RNA expression analysis from formalin fixed paraffin embedded tissues. Histochem Cell Biol. 2008, 130: 435-445. 10.1007/s00418-008-0479-7
Abdueva D, Wing M, Schaub B, Triche T, Davicioni E: Quantitative expression profiling in formalin-fixed paraffin-embedded samples by affymetrix microarrays. J Mol Diagn. 2010, 12: 409-17. 10.2353/jmoldx.2010.090155
Kennedy RD, Bylesjo M, Kerr P, Davison T, Black JM, Kay EW, Holt RJ, Proutski V, Ahdesmaki M, Farztdinov V, Goffard N, Hey P, McDyer F, Mulligan K, Mussen J, O'Brien E, Oliver G, Walker SM, Mulligan JM, Wilson C, Winter A, O'Donoghue D, Mulcahy H, O'Sullivan J, Sheahan K, Hyland J, Dhir R, Bathe OF, Winqvist O, Manne U: Development and independent validation of a prognostic assay for stage II colon cancer using formalin-fixed paraffin-embedded tissue. J Clin Oncol. 2011, 29: 4620-4626. 10.1200/JCO.2011.35.4498
Mittempergher L, de Ronde JJ, Nieuwland M, Kerkhoven RM, Simon I: Gene expression profiles from formalin fixed paraffin embedded breast cancer tissue are largely comparable to fresh frozen matched tissue. PLoS One. 2011, 6 (2): e17163. 10.1371/journal.pone.0017163
Zuber V, Strimmer K: Gene ranking and biomarker discovery under correlation. Bioinformatics. 2009, 25: 2700-2707. 10.1093/bioinformatics/btp460
Ahdesmäki M, Strimmer K: Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Ann Appl Stat. 2010, 4: 503-519. 10.1214/09-AOAS277
Klopfleisch R, Weiss AT, Gruber AD: Excavation of a buried treasure–DNA, mRNA, miRNA and protein analysis in formalin fixed, paraffin embedded tissues. Histol Histopathol. 2011, 26 (6): 797-810.
, : Technical Note: Design and Performance of the Gene-Chip Human Genome U133 Plus 2.0 and Human Genome U133A Plus 2.0 Arrays. 2003, Affymetrix, Inc. Technical Note: GeneChip® Expression Platform: Comparison, Evolution, and Performance, 2004.http://media.affymetrix.com/support/technical/technotes/expression_comparison_technote.pdf, Affymetrix, Inc. Technical Note: GeneChip® Expression Platform: Comparison, Evolution, and Performance, 2004.
Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning; Data Mining, Inference and Prediction. New York: Springer, 2nd, 2009.
Braun M, Menon R, Nikolov P, Kirsten R, Petersen K, Schilling D, Schott C, Gündisch S, Fend F, Becker KF, Perner S: The HOPE fixation technique–a promising alternative to common prostate cancer biobanking approaches. BMC Cancer. 2011, 11: 511. 10.1186/1471-2407-11-511
Klopfleisch R, von Deetzen M, Weiss AT, Weigner J, Weigner F, Plendl J, Gruber AD: Weigners fixative--an alternative to formalin fixation for histology with improved preservation of nucleic acids. Vet Pathol. Apr 26. [Epub ahead of print], 2012.
Sawilowsky SS: Fermat, Schubert, Einstein, and Behrens–Fisher: the probable difference between two means when σ1 ≠ σ2. Journal Mod App Stat Meth. 2002, 1: 461-472.
Krzanowski WJ, Hand DJ: ROC curves for continuous data. [Monographs on statistics and applied probability, vol 111], Boca Raton: CRC Press, 2009.
Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett. 2006, 27: 861-874. 10.1016/j.patrec.2005.10.010
Thalacker-Mercer AE, Fleet JC, Craig BA, Carnell NS: Inadequate protein intake affects skeletal muscle transcript profiles in older humans. Am J Clin Nutr. 2007, 85: 1344-52.
Jin B, Tao Q, Peng J, Soo HM: DNA methyltransferase 3B (DNMT3B) mutations in ICF syndrome lead to altered epigenetic modifications and aberrant expression of genes regulating development, neurogenesis and immune function. Hum Mol Genet. 2008, 17: 690-709.
Viemann D, Goebeler M, Schmid S, Nordhues U: TNF induces distinct gene expression programs in microvascular and macrovascular human endothelial cells. J Leukoc Biol. 2006, 80: 174-85. 10.1189/jlb.0905530
Csoka AB, English SB, Simkevich CP, Ginzinger DG: Genome-scale expression profiling of Hutchinson-Gilford progeria syndrome reveals widespread transcriptional misregulation leading to mesodermal/mesenchymal defects and accelerated atherosclerosis. Aging Cell. 2004, 3: 235-43. 10.1111/j.1474-9728.2004.00105.x
Gumz ML, Zou H, Kreinest PA, Childs AC: Secreted frizzled-related protein 1 loss contributes to tumor phenotype of clear cell renal cell carcinoma. Clin Cancer Res. 2007, 13: 4740-9. 10.1158/1078-0432.CCR-07-0143
Hsu EL, Yoon D, Choi HH, Wang F: A proposed mechanism for the protective effect of dioxin against breast cancer. Toxicol Sci. 2007, 98: 436-44. 10.1093/toxsci/kfm125
Hyrcza MD, Kovacs C, Loutfy M, Halpenny R: Distinct transcriptional profiles in ex vivo CD4+ and CD8+ T cells are established early in human immunodeficiency virus type 1 infection and are characterized by a chronic interferon response as well as extensive transcriptional changes in CD8+ T cells. J Virol. 2007, 81: 3477-86. 10.1128/JVI.01552-06
Pescatori M, Broccolini A, Minetti C, Bertini E: Gene expression profiling in the early phases of DMD: a constant molecular signature characterizes DMD muscle from early postnatal life throughout disease progression. FASEB J. 2007, 21: 1210-26. 10.1096/fj.06-7285com
Burleigh DW, Kendziorski CM, Choi YJ, Grindle KM: Microarray analysis of BeWo and JEG3 trophoblast cell lines: identification of differentially expressed transcripts. Placenta. 2007, 28: 383-9. 10.1016/j.placenta.2006.05.001
Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res. 2007, 35 (Database issue): D760-D765.
Bolstad B: Preprocessing and Normalization for Affymetrix GeneChip Expression Microarrays. Methods in microarray normalization. Edited by: Stafford P. Boca Raton: CRC Press, 2008, 41-60.
Hubbell E, Liu WM, Mei R: Robust estimators for expression analysis. Bioinformatics. 2002, 18: 1585-1592. 10.1093/bioinformatics/18.12.1585
White paper: Statistical Algorithms Description Document. 2002,http://www.affymetrix.com/support/technical/whitepapers/saddwhitepaper.pdf
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31 (4): e15. 10.1093/nar/gng015
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249
Cramer JS: Logit Models from Economics and Other Fields. Cambridge: Cambridge University Press, 2003.
McCarthy DJ, Smyth GK: Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics. 2009, 25 (6): 765-71. 10.1093/bioinformatics/btp053
Dalman MR, Deeter A, Nimishakavi G, Duan ZH: Fold change and p-value cutoffs significantly alter microarray interpretations. BMC Bioinformatics. 2012, 13 (Suppl. 2): S11.
Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003, 4: 210. 10.1186/gb-2003-4-4-210