On the Measure-Theoretic Premises of Bayes Factor and Full Bayesian Significance Tests: a Critical Reevaluation
Tóm tắt
The Full Bayesian Significance Test (FBST) and the Bayesian evidence value recently have received increasing attention across a variety of sciences including psychology. Ly and Wagenmakers (2021) have provided a critical evaluation of the method and concluded that it suffers from four problems which are mostly attributed to the asymptotic relationship of the Bayesian evidence value to the frequentist p-value. While Ly and Wagenmakers (2021) tackle an important question about the best way of statistical hypothesis testing in the cognitive sciences, it is shown in this paper that their arguments are based on a specific measure-theoretic premise. The identified problems hold only under a specific class of prior distributions which are required only when adopting a Bayes factor test. However, the FBST explicitly avoids this premise, which resolves the problems in practical data analysis. In summary, the analysis leads to the more important question whether precise point null hypotheses are realistic for scientific research, and a shift towards the Hodges-Lehmann paradigm may be an appealing solution when there is doubt on the appropriateness of a precise hypothesis.
Tài liệu tham khảo
Anderson, S., & Hauck, W.W. (1983). A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Communications in Statistics - Theory and Methods, 12(23), 2663–2692.
Bauer, H. (2001). Measure and integration theory. Berlin: De Gruyter.
Berger, J. (1985). Statistical Decision Theory and Bayesian Analysis. New York: Springer.
Berger, J., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science, 2(3), 317–335.
Berger, J., & Wolpert, R.L. (1988). The likelihood principle. California, Hayward: Institute of Mathematical Statistics.
Berger, J.O., & Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. Journal of the American Statistical Association, 96(453), 174–184.
Berger, J.O., & Pericchi, L.R. (1996). On the justification of default and intrinsic Bayes factors. In Modelling and Prediction Honoring Seymour Geisser, pages 276–293. Springer New York.
Bernado, J. (1999). Nested hypothesis testing: the Bayesian reference criterion. In J. Bernado, J. Berger, A. Dawid, & A. Smith (Eds.) Bayesian Statistics (Vol. 6), pages 101–130 (with discussion). Oxford University Press, Valencia.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312.
Degroot, M.H. (1973). Doing what comes naturally: interpreting a tail area as a posterior probability or as a likelihood ratio. Journal of the American Statistical Association, 68(344), 966–969.
Diniz, M., Pereira, C.A.B., Polpo, A., Stern, J.M., & Wechsler, S. (2012). Relationship between Bayesian and frequentist significance indices. International Journal for Uncertainty Quantification, 2 (2), 161–172.
Edwards, A. (1992). Likelihood. Baltimore: The Johns Hopkins University Press Maryland. expanded edition.
Etz, A., & Wagenmakers, E.-J. (2015). J. B. S. Haldane’s contribution to the Bayes factor hypothesis test. Statistical Science, 32(2), 313–329.
Feller, W. (1968). An introduction to probability theory and its applications, 3rd edn. Vol. i. New York: John Wiley & Sons.
Good, I. (1950). Probability and the weighing of evidence. London: Charles Griffin.
Good, I.J. (1992). The Bayes/non-Bayes compromise: A brief review. Journal of the American Statistical Association, 87(419), 597–606.
Good, I.J. (1993). C397. Refutation and rejection versus inexactification, and other comments concerning terminology. Journal of Statistical Computation and Simulation, 47(1-2), 91–92.
Good, I.J. (1994). C420. The existence of sharp null hypotheses. Journal of Statistical Computation and Simulation, 49(3-4), 241–242.
Gronau, Q.F., Ly, A., & Wagenmakers, E. -J. (2019). Informed Bayesian t -tests. The American Statistician, 74(2), 137–143.
Held, L., & Sabanés-Bové, D. (2014). Applied Statistical Inference. Berlin, Heidelberg: Springer.
Hodges, J.L., & Lehmann, E.L. (1954). Testing the approximate validity of statistical hypotheses. Journal of the Royal Statistical Society:, Series B (Methodological), 16(2), 261–268.
Jeffreys, H. (1961). Theory of probability, 3rd edn. Oxford: Oxford University Press.
Jeffreys, H. (1980). Some general points in probability theory. In A. Zellner J.B. Kadane (Eds.) Bayesian Analysis in Econometrics and Statistics : Essays in Honor of Harold Jeffreys, pages 451–453. North-Holland Publishing Company, Amsterdam, The Netherlands.
Johnson, V.E., & Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society:, Series B (Statistical Methodology), 72(2), 143–170.
Kelter, R. (2020). Analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Medical Research Methodology, 20(88).
Ly, A., & Wagenmakers, E.-J. (2021). A critical evaluation of the FBST ev for Bayesian hypothesis testing. PsyArxiv Preprint: https://psyarxiv.com/x9t6n.
Morey, R.D., & Rouder, J.N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16(4), 406–419.
Pereira, C.A.d.B., & Stern, J.M. (1999). Evidence and credibility: Full Bayesian significance test for precise hypotheses. Entropy, 1(4), 99–110.
Pereira, C.A.d.B., Stern, J.M., & Wechsler, S. (2008). Can a significance test be genuinely Bayesian? Bayesian Analysis, 3(1), 79–100.
Rao, C.R., & Lovric, M.M. (2016). Testing point null hypothesis of a normal mean and the truth: 21st Century perspective. Journal of Modern Applied Statistical Methods, 15(2), 2–21.
Robert, C.P. (2007). The Bayesian Choice, 2nd edn. Paris: Springer New York.
Robert, C.P. (2014). On the Jeffreys-Lindley paradox. Philosophy of Science, 81(2), 216–232.
Robert, C.P. (2016). The expected demise of the Bayes factor. Journal of Mathematical Psychology, 72(2009), 33–37.
Rousseau, J. (2007). Approximating interval hypothesis : p-values and Bayes factors. In J. Bernado, J. Berger, A. Dawid, & A. Smith (Eds.) Bayesian Statistics (Vol. 8), pages 417–452. Oxford University Press, Valencia.
Savage, L. (1961). The subjective basis of statistical practice. Technical report, Dept of Statistics, University of Michigan, Michigan.
Savage, L.J., Barnard, G., Cornfield, J., Bross, I., Box, G.E.P., Good, I.J., Lindley, D.V., Clunies-Ross, C.W., Pratt, J.W., Levene, H., Goldman, T., Dempster, A.P., Kempthorne, O., & Birnbaum, A. (1962). On the foundations of statistical inference: Discussion. Journal of the American Statistical Association, 57(298), 307–326.
Schervish, M.J. (1995). Theory of statistics. New York: Springer.
Sellke, T., Bayarri, M.J., & Berger, J.O. (2001). Calibration of p values for testing precise null hypotheses. American Statistician, 55(1), 62–71.
van der Vaart, A. (1998). Asymptotic Statistics. Cambridge: Cambridge University Press.
Wang, M., & Liu, G. (2016). A simple two-sample Bayesian t-test for hypothesis testing. American Statistician, 70(2), 195–201.
Wrinch, D., & Jeffreys, H. (1921). XLII. On certain fundamental principles of scientific inquiry. The London Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 42(249), 369–390.