Reitsma JB, Rutjes AW, Khan KS, Coomarasamy A, Bossuyt PM (2009) A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol 62:797–806. https://doi.org/10.1016/j.jclinepi.2009.02.005
Rutjes AW, Reitsma JB, Coomarasamy A, Khan KS, Bossuyt PM (2007) Evaluation of diagnostic tests when there is no gold standard. A review of methods. Health Technol Assess 11:iii, ix-51. doi: https://doi.org/10.3310/hta11500
Bertens LC, Broekhuizen BD, Naaktgeboren CA et al (2013) Use of expert panels to define the reference standard in diagnostic research: a systematic review of published methods and reporting. PLoS Med 10:e1001531. https://doi.org/10.1371/journal.pmed.1001531
van den Berk IAH, Kanglie MMNP, van Engelen TSR et al (2018) OPTimal IMAging strategy in patients suspected of non-traumatic pulmonary disease at the emergency department: chest X-ray or ultra-low-dose CT (OPTIMACT)—a randomised controlled trial chest X-ray or ultra-low-dose CT at the ED: design and rationale. Diagn Progn Res 2:20. https://doi.org/10.1186/s41512-018-0038-1
Klein Klouwenberg PM, Ong DS, Bos LD et al (2013) Interobserver agreement of Centers for Disease Control and Prevention criteria for classifying infections in critically ill patients. Crit Care Med 41:2373–2378. https://doi.org/10.1097/CCM.0b013e3182923712
Twisk JWR (2017) Inleiding in de toegepaste biostatistiek [introduction to applied biostatistics]. Bohn Stafleu van Loghum, Houten
Fleiss JL, Cohen J, Everitt BS (1969) Large sample standard errors of kappa and weighted kappa. Psychol Bull 72:323–327 https://doi.org/10.1037/h0028106
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174 https://doi.org/10.2307/2529310
Landis JR, Koch GG (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33:363–374 doi: https://doi.org/10.2307/2529786
Lameris W, van Randen A, van Es HW et al (2009) Imaging strategies for detection of urgent conditions in patients with acute abdominal pain: diagnostic accuracy study. BMJ 338:b2431. https://doi.org/10.1136/bmj.b2431
Bankier AA, Levine D, Halpern EF, Kressel HY (2010) Consensus interpretation in imaging research: is there a better way? Radiology 257:14–17. https://doi.org/10.1148/radiol.10100252
Obuchowski NA, Zepp RC (1996) Simple steps for improving multiple-reader studies in radiology. AJR Am J Roentgenol 166:517–521. https://doi.org/10.2214/ajr.166.3.8623619
Copeland KT, Checkoway H, McMichael AJ, Holbrook RH (1977) Bias due to misclassification in the estimation of relative risk. Am J Epidemiol 105:488–495. https://doi.org/10.1093/oxfordjournals.aje.a112408
Jurek AM, Greenland S, Maldonado G, Church TR (2005) Proper interpretation of non-differential misclassification effects: expectations vs observations. Int J Epidemiol 34:680–687. https://doi.org/10.1093/ije/dyi060
Boyko EJ, Alderman BW, Baron AE (1988) Reference test errors bias the evaluation of diagnostic tests for ischemic heart disease. J Gen Intern Med 3:476–481. https://doi.org/10.1007/BF02595925