Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies

Psychonomic Bulletin & Review - Tập 23 - Trang 640-647 - 2015

Angélique O. J. Cramer¹, Don van Ravenzwaaij², Dora Matzke¹, Helen Steingroever¹, Ruud Wetzels³, Raoul P. P. P. Grasman¹, Lourens J. Waldorp¹, Eric-Jan Wagenmakers¹

¹Psychological Methods, Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands

²Faculty of Science and Information Technology, School of Psychology, University of Newcastle, Callaghan, Australia

³Data Analytics, Price Waterhouse Coopers, Amsterdam, The Netherlands

Tóm tắt

Many psychologists do not realize that exploratory use of the popular multiway analysis of variance harbors a multiple-comparison problem. In the case of two factors, three separate null hypotheses are subject to test (i.e., two main effects and one interaction). Consequently, the probability of at least one Type I error (if all null hypotheses are true) is 14 % rather than 5 %, if the three tests are independent. We explain the multiple-comparison problem and demonstrate that researchers almost never correct for it. To mitigate the problem, we describe four remedies: the omnibus F test, control of the familywise error rate, control of the false discovery rate, and preregistration of the hypotheses.

Tài liệu tham khảo

Barber, T. X. (1976). Pitfalls in human research: Ten pivotal points. New York, NY: Pergamon Press. Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N., & Golani, I. (2001). Controlling the false discovery rate in behavior genetics research. Behavioural Brain Research, 125, 279–284. doi:10.1016/S0166-4328(01)00297-2 Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. doi:10.1038/nrn3475 Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49, 609–610. doi:10.1016/j.cortex.2012.12.016 Chambers, C. D., Munafo, M., et al. (2013, 5 June). Trust in science would be improved by study pre-registration. The Guardian. Retrieved from www.theguardian.com/science/blog/2013/jun/05/trust-in-science-study-pre-registration de Groot, A. D. (1969). Methodology: Foundations of inference and research in the behavioral sciences. The Hague, The Netherlands: Mouton. Didelez, V., Pigeot, I., & Walter, P. (2006). Modifications of the Bonferroni–Holm procedure for a multi-way ANOVA. Statistical Papers, 47, 181–209. Efron, B., Storey, J., & Tibshirani, R. (2001). Microarrays, empirical Bayes methods, and false discovery rates (Technical Report, July 2001). Stanford, CA: Stanford University, Department of Statistics. Retrieved from http://statweb.stanford.edu/~ckirby/brad/papers/2001MicroEBMethods.pdf Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001b). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151–1160. Feingold, M., & Korsog, P. E. (1986). The correlation and dependence between two F statistics with the same denominator. American Statistician, 40, 218–220. Fletcher, H. J., Daw, H., & Young, J. (1989). Controlling multiple F test errors with an overall F test. Journal of Applied Behavioral Science, 25, 101–108. Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587–606. Goldacre, B. (2009). Bad science. London, UK: Fourth Estate. Hartley, H. O. (1955). Some recent developments in analysis of variance. Communications on Pure and Applied Mathematics, 8, 47–72. Hochberg, Y. (1974). Some generalizations of the t-method in simultaneous inference. Journal of Multivariate Analysis, 4, 224–234. Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. Retrieved from www.jstor.org/stable/4615733 Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75, 383–386. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217. Klugkist, I., Post, L., Haarhuis, F., & van Wesel, F. (2014). Confirmatory methods, or huge samples, are required to obtain power for the evaluation of theories. Open Journal of Statistics, 4, 710–725. Kromrey, J. D., & Dickinson, W. B. (1995). The use of an overall F test to control Type I error rates in factorial analyses of variance: Limitations and better strategies. Journal of Applied Behavioral Science, 31, 51–64. Lehmann, E. L., & Romano, J. P. (2005). Generalization of the familywise error rate. Annals of Statistics, 33, 1138–1154. McHugh, R. (1958). Significance level in factorial design. Journal of Experimental Education, 26, 257–260. Nakagawa, S. (2004). A farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral Ecology, 15, 1044–1045. Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45, 137–141. Olejnik, S., Li, J., & Supattathum, S. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22, 389–406. Poldrack, R. A., Fletcher, P. C., Henson, R. N., Worsley, K. J., Brett, M., & Nichols, T. E. (2008). Guidelines for reporting an fMRI study. NeuroImage, 40, 409–414. R Development Core Team. (2007). R: A language and environment for statistical computing (Version 2.15). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from www.R-project.org Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika, 77, 663–665. Ryan, T. A. (1959). Multiple comparison in psychological research. Psychological Bulletin, 56, 26–47. doi:10.1037/h0042478 Scheffé, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika, 40, 87–110. doi:10.1093/biomet/40.1-2.87 Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826–831. Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751–754. Smith, R. A., Levine, T. R., Lachlan, K. A., & Fediuk, T. A. (2002). The high cost of complexity in experimental design and data analysis: Type I and Type II error rates in multiway ANOVA. Human Communication Research, 28, 515–530. Tukey, J. W. (1994). The problem of multiple comparisons. In H. I. Braun (Ed.), The collected works of John W. Tukey: Vol. 8. Multiple comparisons: 1948–1983 (pp. 1–300). New York, NY: Chapman and Hall. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632–638. doi:10.1177/1745691612463078 Westfall, P. H., Tobias, R. D., & Wolfinger, R. D. (2011). Multiple comparisons and multiple tests using SAS (2nd ed.). Cary, NC: SAS Institute Inc. Wolfe, J. M. (2013). Registered reports and replications in Attention, Perception, & Psychophysics [Editorial]. Attention, Perception, & Psychophysics, 75, 781–783. doi:10.3758/s13414-013-0502-5 Wright, S. P. (1992). Adjusted p-values for simultaneous inference. Biometrics, 48, 1005–1013.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA