Ý nghĩa thống kê: giá trị p, ngưỡng 0,05 và ứng dụng trong radiomics - lý do cho một cách tiếp cận bảo thủ
Tóm tắt
Từ khóa
Tài liệu tham khảo
Amrhein V, Greenland S, McShane B (2019) Scientists rise up against statistical significance. Nature 567:305–307 https://doi.org/10.1038/d41586-019-00857-9
Ioannidis JPA (2019) The importance of predefined rules and prespecified statistical analyses: do not abandon significance. JAMA 321:2067–2068 https://doi.org/10.1001/jama.2019.4582
Berkson J (1942) Tests of significance considered as evidence. J Am Stat Assoc 37:325–335 https://doi.org/10.2307/2279000
Benjamin DJ, Berger JO, Johnson VE et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10 https://doi.org/10.1038/s41562-017-0189-z
Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70:129–133 https://doi.org/10.1080/00031305.2016.1154108
Wasserstein RL, Schirm AL, Lazar NA (2019) Moving to a world beyond “p<0.05”. Am Stat 73:1–19. https://doi.org/10.1080/00031305.2019.1583913
Boring EG (1919) Mathematical vs. scientific significance. Psychol Bull 16:335–338 https://doi.org/10.1037/h0074554
Trafimow D, Marks M (2015) Editorial. Basic Appl Soc Psyc 37:1–2 https://doi.org/10.1080/01973533.2015.1012991
Leek JT, Peng RD (2015) Statistics: p-values are just the tip of the iceberg. Nature 520:612 https://doi.org/10.1038/520612a
Nuzzo R (2015) Scientists perturbed by loss of stat tool to sift research fudge from fact. Sci Am. https://www.scientificamerican.com/article/scientists-perturbed-by-loss-of-stat-tools-to-sift-research-fudge-from-fact/. Accessed May 2, 2019
Ioannidis JPA (2018) The proposal to lower P value thresholds to .005. JAMA 319:1429–1430. https://doi.org/10.1001/jama.2018.1536
Soliani L (2007) Statistica applicata alla ricerca e alle professioni scientifiche. Manuale di statistica univariata e bivariata. Uninova-Gruppo Pegaso, Parma, pp 8–11 http://www.dsa.unipr.it/soliani/soliani.html. Accessed May 2, 2019
Fisher RA (1956) Statistical methods for research workers. Hafner, New York, p 44
Sardanelli F, Di Leo G (2009) Biostatistics for radiologists: Planning, performing, and writing a radiologic study. Springer-Verlag, Milan, pp 68–71
Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124 https://doi.org/10.1371/journal.pmed.0020124
Goodman S, Greenland S (2007) Why most published research findings are false: problems in the analysis. PLoS Med 4:e168 https://doi.org/10.1371/journal.pmed.0040168
Lakens D, Adolfi FG, Albers CJ et al (2018) Justify your alpha. Nat Hum Behav 2:168–171 https://doi.org/10.1038/s41562-018-0311-x
Trafimow D, Amrhein V, Areshenkoff CN et al (2018) Manipulating the alpha level cannot cure significance testing. Front Psychol 9:699 https://doi.org/10.3389/fpsyg.2018.00699
Potti A, Dressman HK, Bild A (2011) Retraction: genomic signatures to guide the use of chemotherapeutics. Nat Med 17:135 https://doi.org/10.1038/nm0111-135
Baggerly KA, Coombes KR (2009) Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann Appl Stat 3:1309–1344 https://doi.org/10.1214/09-AOAS291
Krueger JI, Heck PR (2017) The heuristic value of p in inductive statistical inference. Front Psychol 8:908 https://doi.org/10.3389/fpsyg.2017.00908
Arnett DK, Blumenthal RS, Albert MA et al (2019) 2019 ACC/AHA Guideline on the primary prevention of cardiovascular disease: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol S0735-1097:33876–33878 https://doi.org/10.1016/j.jacc.2019.03.009
Wellons M, Ouyang P, Schreiner PJ, Herrington DM, Vaidya D (2012) Early menopause predicts future coronary heart disease and stroke: the Multi-Ethnic Study of Atherosclerosis. Menopause 19:1081–1087 https://doi.org/10.1097/gme.0b013e3182517bd0
Chomistek AK, Manson JE, Stefanick ML et al (2013) Relationship of sedentary behavior and physical activity to incident cardiovascular disease: results from the Women’s Health Initiative. J Am Coll Cardiol 61:2346–2354 https://doi.org/10.1016/j.jacc.2013.03.031
Sardanelli F, Podo F, Santoro F et al (2011) Multicenter surveillance of women at high genetic breast cancer risk using mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (the high breast cancer risk Italian 1 study): final results. Invest Radiol 46:94–105 https://doi.org/10.1097/RLI.0b013e3181f3fcdf
Williams MC, Hunter A, Shah AS et al (2016) Use of coronary computed tomographic angiography to guide management of patients with coronary disease. J Am Coll Cardiol 67:1759–1768 https://doi.org/10.1016/j.jacc.2016.02.026
Ferdowsian HR, Gluck JP (2015) The ethical challenges of animal research. Camb Q Healthc Ethics 24:391–406 https://doi.org/10.1017/S0963180115000067
Sardanelli F, Alì M, Hunink MG, Houssami N, Sconfienza LM, Di Leo G (2018) To share or not to share? Expected pros and cons of data sharing in radiological research. Eur Radiol 28:2328–2335 https://doi.org/10.1007/s00330-017-5165-5
Pe’er I, Yelensky R, Altshuler D, Daly MJ (2008) Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 32:381–385 https://doi.org/10.1002/gepi.20303
Jannot AS, Ehret G, Perneger T (2015) P < 5 × 10(-8) has emerged as a standard of statistical significance for genome-wide association studies. J Clin Epidemiol 68:460–465 https://doi.org/10.1016/j.jclinepi.2015.01.001
Welter D, MacArthur J, Morales J et al (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006 https://doi.org/10.1093/nar/gkt1229
Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335–346 https://doi.org/10.1038/nrg3706
Alic L, Niessen WJ, Veenland JF (2014) Quantification of heterogeneity as a biomarker in tumor imaging: a systematic review. PLoS One 9:e110300 https://doi.org/10.1371/journal.pone.0110300
Chalkidou A, O’Doherty MJ, Marsden PK (2015) False discovery rates in PET and CT studies with texture features: a systematic review. PLoS One 10:e0124165 https://doi.org/10.1371/journal.pone.0124165
Hilsenbeck S, Clark G, McGuire W (1992) Why do so many prognostic factors fail to pan out? Breast Cancer Res Treat 22:197–206 https://doi.org/10.1007/BF01840833
Altman DG, Lausen B, Sauerbrei W, Schumacher M (1994) Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst 86:829–835 https://doi.org/10.1093/jnci/86.11.829
Pesapane F, Codari M, Sardanelli F (2018) Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp 2:35. https://doi.org/10.1186/s41747-018-0061-6
Goodman WM, Spruill SE, Komaroff E (2019) A proposed hybrid effect size plus p-value criterion: empirical evidence supporting its use. Am Stat 73(suppl 1):168–185. https://doi.org/10.1080/00031305.2018.1564697
Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers J (2011) Statistical evidence in experimental psychology: an empirical comparison using 855 t tests. Perspect Psychol Sci 6:291–298 https://doi.org/10.1177/1745691611406923
Blume JD, Greevy RA, Welty VF, Smith JR, Dupont WD (2019) An introduction to second-generation p-values. Am Stat 73:sup1:157–167. https://doi.org/10.1080/00031305.2018.1537893