Statistical significance for genomewide studies

John D. Storey1,2, Robert Tibshirani1
1Department of Biostatistics, University of Washington, Seattle, WA 98195; and Departments of Health Research and Policy and Statistics, Stanford University, Stanford, CA 94305
2Lewis-Sigler Institute for Integrative Genomics

Tóm tắt

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

Từ khóa


Tài liệu tham khảo

Morton, N. E. (1955) Am. J. Hum. Gen. 7, 277–318.

10.1038/ng1195-241

Storey J. D. (2003) Ann. Stat. in press.

10.1111/1467-9868.00346

Benjamini, Y. & Hochberg, Y. (1995) J. R. Stat. Soc. B 85, 289–300.

10.1016/S0378-3758(99)00041-5

10.3102/10769986025001060

10.1198/016214501753382129

10.1111/1467-9868.00347

Storey J. D. Taylor J. E. & Siegmund D. (2003) J. R. Stat. Soc. B in press.

10.1198/016214503388619256

10.1073/pnas.091062498

10.1073/pnas.91.25.12091

10.1056/NEJM200102223440801

10.1038/ng1033

10.1016/S0968-0004(00)01549-8

10.1126/science.1073774

10.1126/science.1069516

10.1093/genetics/138.3.963

10.1126/science.1075090

10.1101/gad.10.12.1433

10.1006/bbrc.1998.9893

10.1074/jbc.274.52.37461

Efron B. Storey J. D. & Tibshirani R. (2001) Technical Report 2001-217 (Stanford Univ. Palo Alto CA).

10.1002/gepi.1124

Lehmann E. L. (1986) Testing Statistical Hypotheses (Springer New York) 2nd Ed.