Approaches to multiplicity issues in complex research in microarray analysis

Statistica Neerlandica - Tập 60 Số 4 - Trang 414-437 - 2006
Daniel Yekutieli1,2, Anat Reiner‐Benaim1, Yoav Benjamini1, Gregory I. Elmer3, Neri Kafkafi3, Noah Letwin4, Norman H. Lee4
1Department of Statistics and Operations Research, Sackler Faculty of Exact Sciences, Tel-Aviv University, Ramat Aviv, P.O.B. 39040, Tel Aviv 61390, Israel
2[email protected]
3Maryland Psychiatric Research Center, University of Maryland, Baltimore, MO, USA
4Department of Functional Genomics, The Institute for Genomic Research, Maryland & The George Washington University Medical Center, Washington. DC, USA

Tóm tắt

The multiplicity problem is evident in the simplest form of statistical analysis of gene expression data – the identification of differentially expressed genes. In more complex analysis, the problem is compounded by the multiplicity of hypotheses per gene. Thus, in some cases, it may be necessary to consider testing millions of hypotheses. We present three general approaches for addressing multiplicity in large research problems. (a) Use the scalability of false discovery rate (FDR) controlling procedures; (b) apply FDR‐controlling procedures to a selected subset of hypotheses; (c) apply hierarchical FDR‐controlling procedures. We also offer a general framework for ensuring reproducible results in complex research, where a researcher faces more than just one large research problem. We demonstrate these approaches by analyzing the results of a complex experiment involving the study of gene expression levels in different brain regions across multiple mouse strains.

Từ khóa


Tài liệu tham khảo

Abromovich F., 1998, The amalgamation challenge to signal de-noising

10.1111/j.2517-6161.1995.tb02031.x

10.3102/10769986025001060

10.1214/aos/1013699998

10.1534/genetics.104.036699

Benjamini Y., 1999, False discovery rate controlling procedures for pairwise comparisons

Benjamini Y., 2001, Two‐staged linear step‐up FDR controlling procedure

Bonhomme F., 1996, Genetic variants and strains of the laboratory mouse, 1577

10.1038/ng992

10.1198/016214501753382129

Emerson J. D., 1985, Exploring data tables, trends and shapes, 67

10.1002/sim.4780090710

10.1111/1467-9868.00347

Jiang H., 2004, A two step procedure for multiple pairwise comparisons in microarray experiments

10.1073/pnas.0409554102

10.1037/1082-989X.4.1.58

Lee H., 2002, Application of the false discovery rate to quantitative trait loci interval mapping with multiple traits, Genetics, 161, 905, 10.1093/genetics/161.2.905

10.1016/S0166-4328(97)00218-0

10.1093/bioinformatics/btf877

10.1093/biomet/69.3.493

Storey J. D., 2002, Journal of the Royal Statistical Society Series B, 479

10.1214/aos/1074290335

10.1007/0-387-21679-0_12

10.1111/j.1467-9868.2004.00439.x

Tukey J. W., 1977, Exploratory data analysis

10.1006/nimg.2001.0764

10.1073/pnas.091062498

Westfall P. H., 1993, Resampling based multiple testing

Yekutieli D., 2002, Theoretical results needed for applying the false discovery rate in statistical problems

Yekutieli D., 2002, Elkund‐Seeger‐Simes is conservative for testing all pairwise comparisons