Optimal significance analysis of microarray data in a class of tests whose null statistic can be constructed

TEST - Tập 21 - Trang 280-300 - 2011
Hironori Fujisawa1, Takayuki Sakaguchi2
1The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan
2Oita University of Nursing and Health sciences, Oita, Japan

Tóm tắt

Microarray data often consist of a large number of genes and a small number of replicates. We have examined testing the null hypothesis of equality of mean for detecting differentially expressed genes. The p-value for each gene is often estimated using permutation samples not only for the target gene but also for other genes. This method has been widely used and discussed. However, direct use of the permutation method for the p-value estimation may not work well, because two types of genes are mixed in the sample; some genes are differentially expressed, whereas others are not. To overcome this difficulty, various methods for appropriately generating null permutation samples have been proposed. In this paper, we consider two classes of test statistics that are naturally modified to null statistics. We then obtain the uniformly most powerful (UMP) unbiased tests among these classes. If the underlying distribution is symmetric, the UMP unbiased test statistic is similar to that proposed by Pan (Bioinformatics 19:1333–1340, 2003). Under another condition, the UMP unbiased test statistic has a different formula with one more degree of freedom and therefore is expected to give a more powerful test and a more accurate p-value estimation from a modified null statistic. In microarray data, because the number of replicates is often small, differences in the degree of freedom will produce large effects on the power of test and the variance of the p-value estimation. Some simulation studies and real data analyses are illustrated to investigate the performances of the methods.

Tài liệu tham khảo

Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178 Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS (2005) Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol 6:R16 Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160 Gao X (2006) Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments. Bioinformatics 22:1486–1494 Gottardo R, Raftery AE, KY Yeung, Bumgarner RE (2006) Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62:10–18 Ito K, Schull WJ (1964) On the robustness of the \(T_{0}^{2}\) test in multivariate analysis of variance when variance-covariance matrices are not equal. Biometrika 51:71–82 Lehmann EL, Romano JP (2005) Testing statistical hypotheses, 3rd edn. Springer texts in statistics. Springer, New York Lockhart D, Dong B, Byrne M, Follettie M, Gallo M, Chee M, Mittman M (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14:1675–1680 McLachlan G, Bean R, Jones LBT (2006) A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22:1608–1615 Pan W (2003) On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 19:1333–1340 Pan W, Lin J, Le CT, (2003) A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics 3:117–124 Scheid S, Spang R (2006) In: Permutation filtering: a novel concept for significance analysis of large-scale genomic data. Lecture notes comput sci, vol 3909, pp 338–347 Scheid S, Spang R (2007) Compensating for unknown confounders in microarray data analysis using filtered permutations. J Comput Biol 14:669–681 Southworth LK, Kim SK, Owen AB (2009) Properties of balanced permutations. J Comput Biol 16:625–638 Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc B 64:479–498 Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445 Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121 van’t Wout AB, Lehrman GK, Mikheeva SA, O’Keeffe GC, Katze MG, Bumgarner RE, Geiss GK, Mullins JI (2003) Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4+-T-cell lines. J Virol 77:1392–1402 Xie Y, Pan W, Khodursky AB (2005) A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 21:4280–4288 Xu J, Cui X (2008) Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays. Bioinformatics 24:1056–1062 Zhao Y, Pan W (2003) Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics 19:1046–1054