A New Bayesian Two-Sample t Test and Solution to the Behrens–Fisher Problem Based on Gaussian Mixture Modelling with Known Allocations
Statistics in Biosciences - 2021
Tóm tắt
Testing differences between a treatment and control group is common practice in biomedical research like randomized controlled trials (RCT). The standard two-sample t test relies on null hypothesis significance testing (NHST) via p values, which has several drawbacks. Bayesian alternatives were recently introduced using the Bayes factor, which has its own limitations. This paper introduces an alternative to current Bayesian two-sample t tests by interpreting the underlying model as a two-component Gaussian mixture in which the effect size is the quantity of interest, which is most relevant in clinical research. Unlike p values or the Bayes factor, the proposed method focusses on estimation under uncertainty instead of explicit hypothesis testing. Therefore, via a Gibbs sampler, the posterior of the effect size is produced, which is used subsequently for either estimation under uncertainty or explicit hypothesis testing based on the region of practical equivalence (ROPE). An illustrative example, theoretical results and a simulation study show the usefulness of the proposed method, and the test is made available in the R package bayest. In sum, the new Bayesian two-sample t test provides a solution to the Behrens–Fisher problem based on Gaussian mixture modelling.
Từ khóa
Tài liệu tham khảo
Berger J, Wolpert RL (1988) The likelihood principle. Institute of Mathematical Statistics, Hayward
van den Bergh D, Doorn JV, Marsman M, Gupta KN, Sarafoglou A, Jan G, Stefan A, Ly A, Hinne M (2019) A tutorial on conducting and interpreting a Bayesian ANOVA in JASP. psyarxiv preprint. https://psyarxiv.com/spreb
Carlin B, Louis T (2009) Bayesian methods for data analysis. Chapman & Hall, CRC Press, Boca Raton
Carpenter B, Guo J, Hoffman MD, Brubaker M, Gelman A, Lee D, Goodrich B, Li P, Riddell A, Betancourt M (2017) Stan: a probabilistic programming language. J Stat Softw 76(1):1–32. https://doi.org/10.18637/jss.v076.i01
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Routledge, Hillsdale
Cumming G (2014) The new statistics: why and how. Psychol Sci 25(1):7–29. https://doi.org/10.1177/0956797613504966
Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577–588. https://doi.org/10.1080/01621459.1995.10476550
Freedman LS, Lowe D, Macaskill P (1983) Stopping rules for clinical trials. Stat Med 2(2):167–174. https://doi.org/10.1002/sim.4780020210
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
Ghosal S (1996) A review of consistency and convergence of posterior distribution. In: Proceedings of Varanashi symposium in Bayesian inference. Banaras Hindu University
Gigerenzer G, Marewski JN (2015) Surrogate science: the idol of a universal method for scientific inference. J Manag 41(2):421–440. https://doi.org/10.1177/0149206314547522
Gönen M, Johnson WO, Lu Y, Westfall PH (2005) The Bayesian two-sample t test. Am Stat 59(3):252–257. https://doi.org/10.1198/000313005X55233
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31(4):337–350. https://doi.org/10.1007/s10654-016-0149-3
Gronau QF, Ly A, Wagenmakers EJ (2020) Informed Bayesian t-tests. Am Stat 74(2):137–143. https://doi.org/10.1080/00031305.2018.1562983
Held L, Sabanés Bové D (2014) Applied statistical inference. Springer, Berlin. https://doi.org/10.1007/978-3-642-37887-4
Hobbs BP, Carlin BP (2007) Practical Bayesian design and analysis for drug and device clinical trials. J Biopharm Stat 18(1):54–80
Hodges JL, Lehmann EL (1954) Testing the approximate validity of statistical hypotheses. J R Stat Soc: Ser B (Methodol) 16(2):261–268. https://doi.org/10.1111/j.2517-6161.1954.tb00169.x
Kamary K, Mengersen K, Robert CP, Rousseau J (2014) Testing hypotheses via a mixture estimation model, pp 1–37. arXiv preprint. https://arxiv.org/abs/1412.2044. https://doi.org/10.16373/j.cnki.ahr.150049
Kelter R (2020) Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Med Res Methodol. https://doi.org/10.1186/s12874-020-00968-2
Kelter R (2020) Bayesian survival analysis in STAN for improved measuring of uncertainty in parameter estimates. Meas: Interdiscip Res Perspect 18(2):101–119. https://doi.org/10.1080/15366367.2019.1689761
Kelter R (2021) Bayesian Hodges-Lehmann tests for statistical equivalence in the two-sample setting: power analysis, type I error rates and equivalence boundary selection in biomedical research. BMC Med Res Methodol. https://doi.org/10.1186/s12874-021-01341-7
Kelter R (2021) On the measure-theoretic premises of Bayes factor and full Bayesian significance tests: a critical reevaluation. Comput Brain Behav. https://doi.org/10.1007/s42113-021-00110-5
Kruschke JK (2018) Rejecting or accepting parameter values in Bayesian estimation. Adv Methods Pract Psychol Sci 1(2):270–280. https://doi.org/10.1177/2515245918771304
Kruschke JK, Liddell T (2018) The Bayesian new statistics : hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon Bull Rev 25:178–206. https://doi.org/10.3758/s13423-016-1221-4
Lakens D (2014) Performing high-powered studies efficiently with sequential analyses. Eur J Soc Psychol 44(7):701–710. https://doi.org/10.1002/ejsp.2023
Lakens D, Scheel AM, Isager PM (2018) Equivalence testing for psychological research: a tutorial. Adv Methods Pract Psychol Sci 1(2):259–269. https://doi.org/10.1177/2515245918770963
McElreath R, Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLoS ONE 10(8):1–16. https://doi.org/10.1371/journal.pone.0136088
Nuijten MB, Hartgerink CH, van Assen MA, Epskamp S, Wicherts JM (2016) The prevalence of statistical reporting errors in psychology (1985–2013). Behav Res Methods 48(4):1205–1226. https://doi.org/10.3758/s13428-015-0664-2
Rao CR, Lovric MM (2016) Testing point null hypothesis of a normal mean and the truth: 21st century perspective. J Mod Appl Stat Methods 15(2):2–21. https://doi.org/10.22237/jmasm/1478001660
Richardson S, Green P (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B (Methodol) 59(4):731–792
Robert C, Casella G (2004) Monte Carlo statistical methods. Springer, New York
Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237. https://doi.org/10.3758/PBR.16.2.225
Rüschendorf L (2014) Mathematische Statistik. Springer, Berlin
Schuirmann D (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm 15(6):657–80
van Doorn J, van den Bergh D, Bohm U, Dablander F, Derks K, Draws T, Evans NJ, Gronau QF, Hinne M, Kucharský S, Ly A, Marsman M, Matzke D, Raj A, Sarafoglou A, Stefan A, Voelkel JG, Wagenmakers EJ (2019) The JASP guidelines for conducting and reporting a Bayesian analysis. psyarxiv preprint. https://psyarxiv.com/yqxfr. https://doi.org/10.31234/osf.io/yqxfr
Wagenmakers EJ, Beek T, Rotteveel M, Gierholz A, Matzke D, Steingroever H, Ly A, Verhagen J, Selker R, Sasiadek A, Gronau QF, Love J, Pinto Y (2015) Turning the hands of time again: a purely confirmatory replication study and a Bayesian analysis. Front Psychol 6:494
Wang M, Liu G (2016) A simple two-sample Bayesian t-test for hypothesis testing. Am Stat 70(2):195–201. https://doi.org/10.1080/00031305.2015.1093027
Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70(2):129–133. https://doi.org/10.1080/00031305.2016.1154108
Wasserstein RL, Schirm AL, Lazar NA (2019) Moving to a world beyond “\(\text{ p }<0.05\)’’. Am Stat 73(sup1):1–19. https://doi.org/10.1080/00031305.2019.1583913
Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers EJ (2011) Statistical evidence in experimental psychology: an empirical comparison using 855 t tests. Perspect Psychol Sci 6(3):291–298. https://doi.org/10.1177/1745691611406923