Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies
Tóm tắt
Many psychologists do not realize that exploratory use of the popular multiway analysis of variance harbors a multiple-comparison problem. In the case of two factors, three separate null hypotheses are subject to test (i.e., two main effects and one interaction). Consequently, the probability of at least one Type I error (if all null hypotheses are true) is 14 % rather than 5 %, if the three tests are independent. We explain the multiple-comparison problem and demonstrate that researchers almost never correct for it. To mitigate the problem, we describe four remedies: the omnibus F test, control of the familywise error rate, control of the false discovery rate, and preregistration of the hypotheses.
Tài liệu tham khảo
Barber, T. X. (1976). Pitfalls in human research: Ten pivotal points. New York, NY: Pergamon Press.
Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N., & Golani, I. (2001). Controlling the false discovery rate in behavior genetics research. Behavioural Brain Research, 125, 279–284. doi:10.1016/S0166-4328(01)00297-2
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. doi:10.1038/nrn3475
Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49, 609–610. doi:10.1016/j.cortex.2012.12.016
Chambers, C. D., Munafo, M., et al. (2013, 5 June). Trust in science would be improved by study pre-registration. The Guardian. Retrieved from www.theguardian.com/science/blog/2013/jun/05/trust-in-science-study-pre-registration
de Groot, A. D. (1969). Methodology: Foundations of inference and research in the behavioral sciences. The Hague, The Netherlands: Mouton.
Didelez, V., Pigeot, I., & Walter, P. (2006). Modifications of the Bonferroni–Holm procedure for a multi-way ANOVA. Statistical Papers, 47, 181–209.
Efron, B., Storey, J., & Tibshirani, R. (2001). Microarrays, empirical Bayes methods, and false discovery rates (Technical Report, July 2001). Stanford, CA: Stanford University, Department of Statistics. Retrieved from http://statweb.stanford.edu/~ckirby/brad/papers/2001MicroEBMethods.pdf
Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001b). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151–1160.
Feingold, M., & Korsog, P. E. (1986). The correlation and dependence between two F statistics with the same denominator. American Statistician, 40, 218–220.
Fletcher, H. J., Daw, H., & Young, J. (1989). Controlling multiple F test errors with an overall F test. Journal of Applied Behavioral Science, 25, 101–108.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606.
Goldacre, B. (2009). Bad science. London, UK: Fourth Estate.
Hartley, H. O. (1955). Some recent developments in analysis of variance. Communications on Pure and Applied Mathematics, 8, 47–72.
Hochberg, Y. (1974). Some generalizations of the t-method in simultaneous inference. Journal of Multivariate Analysis, 4, 224–234.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. Retrieved from www.jstor.org/stable/4615733
Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75, 383–386.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217.
Klugkist, I., Post, L., Haarhuis, F., & van Wesel, F. (2014). Confirmatory methods, or huge samples, are required to obtain power for the evaluation of theories. Open Journal of Statistics, 4, 710–725.
Kromrey, J. D., & Dickinson, W. B. (1995). The use of an overall F test to control Type I error rates in factorial analyses of variance: Limitations and better strategies. Journal of Applied Behavioral Science, 31, 51–64.
Lehmann, E. L., & Romano, J. P. (2005). Generalization of the familywise error rate. Annals of Statistics, 33, 1138–1154.
McHugh, R. (1958). Significance level in factorial design. Journal of Experimental Education, 26, 257–260.
Nakagawa, S. (2004). A farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral Ecology, 15, 1044–1045.
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45, 137–141.
Olejnik, S., Li, J., & Supattathum, S. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22, 389–406.
Poldrack, R. A., Fletcher, P. C., Henson, R. N., Worsley, K. J., Brett, M., & Nichols, T. E. (2008). Guidelines for reporting an fMRI study. NeuroImage, 40, 409–414.
R Development Core Team. (2007). R: A language and environment for statistical computing (Version 2.15). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from www.R-project.org
Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika, 77, 663–665.
Ryan, T. A. (1959). Multiple comparison in psychological research. Psychological Bulletin, 56, 26–47. doi:10.1037/h0042478
Scheffé, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika, 40, 87–110. doi:10.1093/biomet/40.1-2.87
Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826–831.
Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751–754.
Smith, R. A., Levine, T. R., Lachlan, K. A., & Fediuk, T. A. (2002). The high cost of complexity in experimental design and data analysis: Type I and Type II error rates in multiway ANOVA. Human Communication Research, 28, 515–530.
Tukey, J. W. (1994). The problem of multiple comparisons. In H. I. Braun (Ed.), The collected works of John W. Tukey: Vol. 8. Multiple comparisons: 1948–1983 (pp. 1–300). New York, NY: Chapman and Hall.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632–638. doi:10.1177/1745691612463078
Westfall, P. H., Tobias, R. D., & Wolfinger, R. D. (2011). Multiple comparisons and multiple tests using SAS (2nd ed.). Cary, NC: SAS Institute Inc.
Wolfe, J. M. (2013). Registered reports and replications in Attention, Perception, & Psychophysics [Editorial]. Attention, Perception, & Psychophysics, 75, 781–783. doi:10.3758/s13414-013-0502-5
Wright, S. P. (1992). Adjusted p-values for simultaneous inference. Biometrics, 48, 1005–1013.
