Ý nghĩa thống kê: giá trị p, ngưỡng 0,05 và ứng dụng trong radiomics - lý do cho một cách tiếp cận bảo thủ
Tóm tắt
Trong bài viết này, chúng tôi tóm tắt cuộc tranh luận chưa được giải quyết về giá trị
Từ khóa
Tài liệu tham khảo
Amrhein V, Greenland S, McShane B (2019) Scientists rise up against statistical significance. Nature 567:305–307 https://doi.org/10.1038/d41586-019-00857-9
Ioannidis JPA (2019) The importance of predefined rules and prespecified statistical analyses: do not abandon significance. JAMA 321:2067–2068 https://doi.org/10.1001/jama.2019.4582
Berkson J (1942) Tests of significance considered as evidence. J Am Stat Assoc 37:325–335 https://doi.org/10.2307/2279000
Benjamin DJ, Berger JO, Johnson VE et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10 https://doi.org/10.1038/s41562-017-0189-z
Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70:129–133 https://doi.org/10.1080/00031305.2016.1154108
Wasserstein RL, Schirm AL, Lazar NA (2019) Moving to a world beyond “p<0.05”. Am Stat 73:1–19. https://doi.org/10.1080/00031305.2019.1583913
Boring EG (1919) Mathematical vs. scientific significance. Psychol Bull 16:335–338 https://doi.org/10.1037/h0074554
Trafimow D, Marks M (2015) Editorial. Basic Appl Soc Psyc 37:1–2 https://doi.org/10.1080/01973533.2015.1012991
Leek JT, Peng RD (2015) Statistics: p-values are just the tip of the iceberg. Nature 520:612 https://doi.org/10.1038/520612a
Nuzzo R (2015) Scientists perturbed by loss of stat tool to sift research fudge from fact. Sci Am. https://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/. Accessed May 2, 2019
Ioannidis JPA (2018) The proposal to lower P value thresholds to .005. JAMA 319:1429–1430. https://doi.org/10.1001/jama.2018.1536
Soliani L (2007) Statistica applicata alla ricerca e alle professioni scientifiche. Manuale di statistica univariata e bivariata. Uninova-Gruppo Pegaso, Parma, pp 8–11 http://www.dsa.unipr.it/soliani/soliani.html. Accessed May 2, 2019
Fisher RA (1956) Statistical methods for research workers. Hafner, New York, p 44
Sardanelli F, Di Leo G (2009) Biostatistics for radiologists: Planning, performing, and writing a radiologic study. Springer-Verlag, Milan, pp 68–71
Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124 https://doi.org/10.1371/journal.pmed.0020124
Goodman S, Greenland S (2007) Why most published research findings are false: problems in the analysis. PLoS Med 4:e168 https://doi.org/10.1371/journal.pmed.0040168
Lakens D, Adolfi FG, Albers CJ et al (2018) Justify your alpha. Nat Hum Behav 2:168–171 https://doi.org/10.1038/s41562-018-0311-x
Trafimow D, Amrhein V, Areshenkoff CN et al (2018) Manipulating the alpha level cannot cure significance testing. Front Psychol 9:699 https://doi.org/10.3389/fpsyg.2018.00699
Potti A, Dressman HK, Bild A (2011) Retraction: genomic signatures to guide the use of chemotherapeutics. Nat Med 17:135 https://doi.org/10.1038/nm0111-135
Baggerly KA, Coombes KR (2009) Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann Appl Stat 3:1309–1344 https://doi.org/10.1214/09-AOAS291
Krueger JI, Heck PR (2017) The heuristic value of p in inductive statistical inference. Front Psychol 8:908 https://doi.org/10.3389/fpsyg.2017.00908
Arnett DK, Blumenthal RS, Albert MA et al (2019) 2019 ACC/AHA Guideline on the primary prevention of cardiovascular disease: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol S0735-1097:33876–33878 https://doi.org/10.1016/j.jacc.2019.03.009
Wellons M, Ouyang P, Schreiner PJ, Herrington DM, Vaidya D (2012) Early menopause predicts future coronary heart disease and stroke: the Multi-Ethnic Study of Atherosclerosis. Menopause 19:1081–1087 https://doi.org/10.1097/gme.0b013e3182517bd0
Chomistek AK, Manson JE, Stefanick ML et al (2013) Relationship of sedentary behavior and physical activity to incident cardiovascular disease: results from the Women’s Health Initiative. J Am Coll Cardiol 61:2346–2354 https://doi.org/10.1016/j.jacc.2013.03.031
Sardanelli F, Podo F, Santoro F et al (2011) Multicenter surveillance of women at high genetic breast cancer risk using mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (the high breast cancer risk Italian 1 study): final results. Invest Radiol 46:94–105 https://doi.org/10.1097/RLI.0b013e3181f3fcdf
Williams MC, Hunter A, Shah AS et al (2016) Use of coronary computed tomographic angiography to guide management of patients with coronary disease. J Am Coll Cardiol 67:1759–1768 https://doi.org/10.1016/j.jacc.2016.02.026
Ferdowsian HR, Gluck JP (2015) The ethical challenges of animal research. Camb Q Healthc Ethics 24:391–406 https://doi.org/10.1017/S0963180115000067
Sardanelli F, Alì M, Hunink MG, Houssami N, Sconfienza LM, Di Leo G (2018) To share or not to share? Expected pros and cons of data sharing in radiological research. Eur Radiol 28:2328–2335 https://doi.org/10.1007/s00330-017-5165-5
Pe’er I, Yelensky R, Altshuler D, Daly MJ (2008) Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 32:381–385 https://doi.org/10.1002/gepi.20303
Jannot AS, Ehret G, Perneger T (2015) P < 5 × 10(-8) has emerged as a standard of statistical significance for genome-wide association studies. J Clin Epidemiol 68:460–465 https://doi.org/10.1016/j.jclinepi.2015.01.001
Welter D, MacArthur J, Morales J et al (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006 https://doi.org/10.1093/nar/gkt1229
Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335–346 https://doi.org/10.1038/nrg3706
Alic L, Niessen WJ, Veenland JF (2014) Quantification of heterogeneity as a biomarker in tumor imaging: a systematic review. PLoS One 9:e110300 https://doi.org/10.1371/journal.pone.0110300
Chalkidou A, O’Doherty MJ, Marsden PK (2015) False discovery rates in PET and CT studies with texture features: a systematic review. PLoS One 10:e0124165 https://doi.org/10.1371/journal.pone.0124165
Hilsenbeck S, Clark G, McGuire W (1992) Why do so many prognostic factors fail to pan out? Breast Cancer Res Treat 22:197–206 https://doi.org/10.1007/BF01840833
Altman DG, Lausen B, Sauerbrei W, Schumacher M (1994) Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst 86:829–835 https://doi.org/10.1093/jnci/86.11.829
Pesapane F, Codari M, Sardanelli F (2018) Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp 2:35. https://doi.org/10.1186/s41747-018-0061-6
Goodman WM, Spruill SE, Komaroff E (2019) A proposed hybrid effect size plus p-value criterion: empirical evidence supporting its use. Am Stat 73(suppl 1):168–185. https://doi.org/10.1080/00031305.2018.1564697
Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers J (2011) Statistical evidence in experimental psychology: an empirical comparison using 855 t tests. Perspect Psychol Sci 6:291–298 https://doi.org/10.1177/1745691611406923
Blume JD, Greevy RA, Welty VF, Smith JR, Dupont WD (2019) An introduction to second-generation p-values. Am Stat 73:sup1:157–167. https://doi.org/10.1080/00031305.2018.1537893