A practical solution to the pervasive problems ofp values
Tóm tắt
Từ khóa
Tài liệu tham khảo
Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control,19, 716–723.
Anscombe, F. J. (1954). Fixed-sample-size analysis of sequential observations.Biometrics,10, 89–100.
Anscombe, F. J. (1963). Sequential medical trials.Journal of the American Statistical Association,58, 365–383.
Armitage, P. (1960).Sequential medical trials. Springfield, IL: Thomas.
Armitage, P., McPherson, C. K., &Rowe, B. C. (1969). Repeated significance tests on accumulating data.Journal of the Royal Statistical Society: Series A,132, 235–244.
Bakan, D. (1966). The test of significance in psychological research.Psychological Bulletin,66, 423–437.
Basu, D. (1964). Recovery of ancillary information.Sankhya: Series A,26, 3–16.
Bayarri, M.-J., &Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis.Statistical Science,19, 58–80.
Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.
Berger, J. O. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing?Statistical Science,18, 1–32.
Berger, J. O., &Berry, D. A. (1988a). The relevance of stopping rules in statistical inference. In S. S. Gupta & J. O. Berger (Eds.),Statistical decision theory and related topics IV (Vol. 1, pp. 29–72). New York: Springer.
Berger, J. O., &Berry, D. A. (1988b). Statistical analysis and the illusion of objectivity.American Scientist,76, 159–165.
Berger, J. O., Boukai, B., &Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion).Statistical Science,12, 133–160.
Berger, J. O., Brown, L., &Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential hypothesis testing.Annals of Statistics,22, 1787–1807.
Berger, J. O., &Mortera, J. (1999). Default Bayes factors for nonnested hypothesis testing.Journal of the American Statistical Association,94, 542–554.
Berger, J. O., &Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction.Journal of the American Statistical Association,91, 109–122.
Berger, J. O., &Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence.Journal of the American Statistical Association,82, 112–139.
Berger, J. O., &Wolpert, R. L. (1988).The likelihood principle (2nd ed.). Hayward, CA: Institute of Mathematical Statistics.
Birnbaum, A. (1962). On the foundations of statistical inference (with discussion).Journal of the American Statistical Association,53, 259–326.
Birnbaum, A. (1977). The Neyman—Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley—Savage argument for Bayesian theory.Synthese,36, 19–49.
Box, G. E. P., &Tiao, G. C. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.
Burdette, W. J., &Gehan, E. A. (1970).Planning and analysis of clinical studies. Springfield, IL: Thomas.
Burnham, K. P., &Anderson, D. R. (2002).Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer.
Busemeyer, J. R., &Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task.Psychological Assessment,14, 253–262.
Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes.American Statistician,59, 121–126.
Cornfield, J. (1966). Sequential trials, sequential analysis, and the likelihood principle.American Statistician,20, 18–23.
Cortina, J. M., &Dunlap, W. P. (1997). On the logic and purpose of significance testing.Psychological Methods,2, 161–172.
Cox, D. R. (1958). Some problems connected with statistical inference.Annals of Mathematical Statistics,29, 357–372.
Cox, D. R. (1971). The choice between alternative ancillary statistics.Journal of the Royal Statistical Society: Series B,33, 251–255.
Cox, R. T. (1946). Probability, frequency and reasonable expectation.American Journal of Physics,14, 1–13.
Cumming, G. (2007). Replication and p values: p values predict the future vaguely, but confidence intervals do better. Manuscript submitted for publication.
D’Agostini, G. (1999). Teaching statistics in the physics curriculum: Unifying and clarifying role of subjective probability.American Journal of Physics,67, 1260–1268.
Dawid, A. P. (1984). Statistical theory: The prequential approach.Journal of the Royal Statistical Society: Series A,147, 278–292.
De Finetti, B. (1974).Theory of probability: A critical introductory treatment (Vols. 1 & 2; A. Machí & A. Smith, Trans.). London: Wiley.
Diamond, G. A., &Forrester, J. S. (1983). Clinical trials and statistical verdicts: Probable grounds for appeal.Annals of Internal Medicine,98, 385–394.
Dickey, J. M. (1973). Scientific reporting and personal probabilities: Student’s hypothesis.Journal of the Royal Statistical Society: Series B,35, 285–305.
Dickey, J. M. (1977). Is the tail area useful as an approximate Bayes factor?Journal of the American Statistical Association,72, 138–142.
Dixon, P. (2003). The p value fallacy and how to avoid it.Canadian Journal of Experimental Psychology,57, 189–202.
Djurić, P. M. (1998). Asymptotic MAP criteria for model selection.IEEE Transactions on Signal Processing,46, 2726–2735.
Edwards, W., Lindman, H., &Savage, L. J. (1963). Bayesian statistical inference for psychological research.Psychological Review,70, 193–242.
Efron, B. (2005). Bayesians, frequentists, and scientists.Journal of the American Statistical Association,100, 1–5.
Efron, B., &Tibshirani, R. (1997). Improvements on cross-validation: The.6321 bootstrap method.Journal of the American Statistical Association,92, 548–560.
Feller, W. (1940). Statistical aspects of ESP.Journal of Parapsychology,4, 271–298.
Feller, W. (1970).An introduction to probability theory and its applications: Vol. 1 (2nd ed.). New York: Wiley.
Fine, T. L. (1973).Theories of probability: An examination of foundations. New York: Academic Press.
Firth, D., &Kuha, J. (1999). Comments on “A critique of the Bayesian information criterion for model selection.”Sociological Methods & Research,27, 398–402.
Fisher, R. A. (1934).Statistical methods for research workers (5th ed.). London: Oliver & Boyd.
Fisher, R. A. (1935a).The design of experiments. Edinburgh: Oliver & Boyd.
Fisher, R. A. (1935b). The logic of inductive inference (with discussion).Journal of the Royal Statistical Society,98, 39–82.
Fisher, R. A. (1958).Statistical methods for research workers (13th ed.). New York: Hafner.
Freireich, E. J., Gehan, E., Frei, E., III,Schroeder, L. R., Wolman, I. J., Anbari, R., et al. (1963). The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia: A model for evaluation of other potentially useful therapy.Blood,21, 699–716.
Frick, R. W. (1996). The appropriate use of null hypothesis testing.Psychological Methods,1, 379–390.
Friedman, L. M., Furberg, C. D., &DeMets, D. L. (1998).Fundamentals of clinical trials (3rd ed.). New York: Springer.
Galavotti, M. C. (2005).A philosophical introduction to probability. Stanford: CSLI Publications.
Geisser, S. (1975). The predictive sample reuse method with applications.Journal of the American Statistical Association,70, 320–328.
Gelman, A., &Rubin, D. B. (1999). Evaluating and using statistical methods in the social sciences.Sociological Methods & Research,27, 403–410.
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.),A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum.
Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals.Behavioral & Brain Sciences,21, 199–200.
Gilks, W. R., Richardson, S., &Spiegelhalter, D. J. (Eds.) (1996).Markov chain Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC.
Gill, J. (2002).Bayesian methods: A social and behavioral sciences approach. Boca Raton, FL: CRC Press.
Glover, S., &Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists.Psychonomic Bulletin & Review,11, 791–806.
Good, I. J. (1983).Good thinking: The foundations of probability and its applications. Minneapolis: University of Minnesota Press.
Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.),Bayesian statistics 2: Proceedings of the Second Valencia International Meeting, September 6/10, 1983 (pp. 249–269). Amsterdam: North-Holland.
Goodman, S. N. (1993).p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate.American Journal of Epidemiology,137, 485–496.
Grünwald, P. [D.] (2000). Model selection based on minimum description length.Journal of Mathematical Psychology,44, 133–152.
Grünwald, P. D., Myung, I. J., &Pitt, M. A. (Eds.) (2005).Advances in minimum description length: Theory and applications. Cambridge, MA: MIT Press.
Hagen, R. L. (1997). In praise of the null hypothesis statistical test.American Psychologist,52, 15–24.
Hannan, E. J. (1980). The estimation of the order of an ARMA process.Annals of Statistics,8, 1071–1081.
Helland, I. S. (1995). Simple counterexamples against the conditionality principle.American Statistician,49, 351–356.
Hill, B. M. (1985). Some subjective Bayesian considerations in the selection of models.Econometric Reviews,4, 191–246.
Howson, C., &Urbach, P. (2005).Scientific reasoning: The Bayesian approach (3rd. ed.). Chicago: Open Court.
Hubbard, R., &Bayarri, M.-J. (2003). Confusion over measures of evidence (p’s) versus errors (a’s) in classical statistical testing.American Statistician,57, 171–182.
Jaynes, E. T. (1968). Prior probabilities.IEEE Transactions on Systems Science & Cybernetics,4, 227–241.
Jaynes, E. T. (2003).Probability theory: The logic of science. Cambridge: Cambridge University Press.
Jeffreys, H. (1961).Theory of probability. Oxford: Oxford University Press.
Jennison, C., &Turnbull, B. W. (1990). Statistical approaches to interim monitoring of medical trials: A review and commentary.Statistical Science,5, 299–317.
Kadane, J. B., Schervish, M. J., &Seidenfeld, T. (1996). Reasoning to a foregone conclusion.Journal of the American Statistical Association,91, 1228–1235.
Karabatsos, G. (2006). Bayesian nonparametric model selection and model testing.Journal of Mathematical Psychology,50, 123–148.
Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 377–395.
Kass, R. E., &Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association,90, 928–934.
Kass, R. E., &Wasserman, L. (1996). The selection of prior distributions by formal rules.Journal of the American Statistical Association,91, 1343–1370.
Killeen, P. R. (2005a). An alternative to null-hypothesis significance tests.Psychological Science,16, 345–353.
Killeen, P. R. (2006). Beyond statistical inference: A decision theory for science.Psychonomic Bulletin & Review,13, 549–562.
Klugkist, I., Laudy, O., &Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach.Psychological Methods,10, 477–493.
Lee, M. D. (2002). Generating additive clustering models with limited stochastic complexity.Journal of Classification,19, 69–85.
Lee, M. D., &Pope, K. J. (2006). Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference.Journal of Mathematical Psychology,50, 193–202.
Lee, M. D., &Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003).Psychological Review,112, 662–668.
Lee, P. M. (1989).Bayesian statistics: An introduction. New York: Oxford University Press.
Lindley, D. V. (1972).Bayesian statistics: A review. Philadelphia: Society for Industrial & Applied Mathematics.
Lindley, D. V. (1982). Scoring rules and the inevitability of probability.International Statistical Review,50, 1–26.
Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine.Teaching Statistics,15, 22–25.
Lindley, D. V., &Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view).American Statistician,30, 112–119.
Lindley, D. V., &Scott, W. F. (1984).New Cambridge elementary statistical tables. Cambridge: Cambridge University Press.
Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data.Current Directions in Psychological Science,5, 161–171.
Loftus, G. R. (2002). Analysis, interpretation, and visual presentation of experimental data. In H. Pashler (Ed. in Chief) & J. Wixted (Vol. Ed.),Stevens’ Handbook of experimental psychology: Vol. 4. Methodology in experimental psychology (3rd ed., pp. 339–390). New York: Wiley.
Ludbrook, J. (2003). Interim analyses of data as they accumulate in laboratory experimentation.BMC Medical Research Methodology,3, 15.
McCarroll, D., Crays, N., &Dunlap, W. P. (1992). Sequential ANOVAs and Type I error rates.Educational & Psychological Measurement,52, 387–393.
Myung, I. J. (2000). The importance of complexity in model selection.Journal of Mathematical Psychology,44, 190–204.
Myung, I. J., Forster, M. R., & Browne, M. W. (Eds.) (2000). Model selection [Special issue].Journal of Mathematical Psychology,44(1).
Myung, I. J., Navarro, D. J., &Pitt, M. A. (2006). Model selection by normalized maximum likelihood.Journal of Mathematical Psychology,50, 167–179.
Myung, I. J., &Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach.Psychonomic Bulletin & Review,4, 79–95.
Nelson, N., Rosenthal, R., &Rosnow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers.American Psychologist,41, 1299–1301.
Neyman, J., &Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society: Series A,231, 289–337.
Nickerson, R. S. (2000). Null hypothesis statistical testing: A review of an old and continuing controversy.Psychological Methods,5, 241–301.
O’Hagan, A. (1997). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.
O’Hagan, A., &Forster, J. (2004).Kendall’s advanced theory of statistics: Vol. 2B. Bayesian inference (2nd ed.). London: Arnold.
Pauler, D. K. (1998). The Schwarz criterion and related methods for normal linear models.Biometrika,85, 13–27.
Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., et al. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient: I. Introduction and design.British Journal of Cancer,34, 585–612.
Pitt, M. A., Myung, I. J., &Zhang, S. (2002). Toward a method of selecting among computational models of cognition.Psychological Review,109, 472–491.
Pocock, S. J. (1983).Clinical trials: A practical approach. New York: Wiley.
Pratt, J. W. (1961). [Review of Lehmann, E. L., Testing statistical hypotheses].Journal of the American Statistical Association,56, 163–167.
Pratt, J. W. (1962). On the foundations of statistical inference: Discussion.Journal of the American Statistical Association,57, 314–315.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Newbury Park, CA: Sage.
Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.),Sociological methodology 1995 (pp. 111–196). Cambridge, MA: Blackwell.
Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.),Markov chain Monte Carlo in practice (pp. 163–187). Boca Raton, FL: Chapman & Hall/CRC.
Rissanen, J. (2001). Strong optimality of the normalized ML models as universal codes and information in data.IEEE Transactions on Information Theory,47, 1712–1717.
Rosenthal, R., &Gaito, J. (1963). The interpretation of levels of significance by psychological researchers.Journal of Psychology,55, 33–38.
Rouder, J. N., &Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection.Psychonomic Bulletin & Review,12, 573–604.
Rouder, J. N., Lu, J., Speckman, P., Sun, D., &Jiang, Y. (2005). A hierarchical model for estimating response time distributions.Psychonomic Bulletin & Review,12, 195–223.
Royall, R. M. (1997).Statistical evidence: A likelihood paradigm. London: Chapman & Hall.
Savage, L. J. (1954).The foundations of statistics. New York: Wiley.
Schervish, M. J. (1996).P values: What they are and what they are not.American Statistician,50, 203–206.
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.Psychological Methods,1, 115–129.
Sellke, T., Bayarri, M.-J., &Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses.American Statistician,55, 62–71.
Smith, A. F. M., &Spiegelhalter, D. J. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society: Series B,42, 213–220.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion).Journal of the Royal Statistical Society: Series B,36, 111–147.
Strube, M. J. (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing.Behavior Research Methods,38, 24–27.
Stuart, A., Ord, J. K., &Arnold, S. (1999).Kendall’s advanced theory of statistics: Vol. 2A. Classical inference and the linear model (6th ed.). London: Arnold.
Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem.Psychological Review,110, 526–535.
Vickers, D., Lee, M. D., Dry, M., &Hughes, P. (2003). The roles of the convex hull and the number of potential intersections in performance on visually presented traveling salesperson problems.Memory & Cognition,31, 1094–1104.
Wagenmakers, E.-J. (2003). How many parameters does it take to fit an elephant? [Book review].Journal of Mathematical Psychology,47, 580–586.
Wagenmakers, E.-J., &Farrell, S. (2004). AIC model selection using Akaike weights.Psychonomic Bulletin & Review,11, 192–196.
Wagenmakers, E.-J., &Grünwald, P. (2006). A Bayesian perspective on hypothesis testing: A comment on Killeen (2005).Psychological Science,17, 641–642.
Wagenmakers, E.-J., Grünwald, P., &Steyvers, M. (2006). Accumulative prediction error and the selection of time series models.Journal of Mathematical Psychology,50, 149–166.
Wagenmakers, E.-J., Ratcliff, R., Gomez, P., &Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap.Journal of Mathematical Psychology,48, 28–50.
Wagenmakers, E.-J., & Waldorp, L. (Eds.) (2006). Model selection: Theoretical developments and applications [Special issue].Journal of Mathematical Psychology,50(2).
Wainer, H. (1999). One cheer for null hypothesis significance testing.Psychological Methods,4, 212–213.
Wallace, C. S., &Dowe, D. L. (1999). Refinements of MDL and MML coding.Computer Journal,42, 330–337.
Ware, J. H. (1989). Investigating therapies of potentially great benefit: ECMO.Statistical Science,4, 298–340.
Wasserman, L. (2000). Bayesian model selection and model averaging.Journal of Mathematical Psychology,44, 92–107.
Wasserman, L. (2004).All of statistics: A concise course in statistical inference. New York: Springer.
Weakliem, D. L. (1999). A critique of the Bayesian information criterion for model selection.Sociological Methods & Research,27, 359–397.
Wilkinson, L., &the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations.American Psychologist,54, 594–604.
Winship, C. (1999). Editor’s introduction to the special issue on the Bayesian information criterion.Sociological Methods & Research,27, 355–358.