A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples

BMC Medical Research Methodology - Tập 13 Số 1 - 2013
Nahathai Wongpakaran1, Tinakon Wongpakaran1, Danny Wedding2, Kilem L. Gwet3
1Department of Psychiatry, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
2California School of Professional Psychology, Alliant International University, San Francisco, USA
3Statistical Consultant Advanced Analytics, Gaithersburg, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

First MB, Gibbon M, Spitzer RL, Williams JBW, Benjamin LS: Structured Clinical Interview for DSM-IV Axis II Personality Disorder (SCID-II). 1997, Washington, DC: merican Psychiatric Press

Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960, 20: 37-46. 10.1177/001316446002000104.

Cohen J: Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull. 1968, 70: 213-220.

Wongpakaran T, Wongpakaran N, Bookkamana P, Boonyanaruthee V, Pinyopornpanish M, Likhitsathian S, Suttajit S, Srisutadsanavong U: Interrater reliability of Thai version of the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (T-SCID II). J Med Assoc Thai. 2012, 95: 264-269.

Dreessen L, Arntz A: Short-interval test-retest interrater reliability of the Structured Clinical Interview for DSM-III-R personality disorders (SCID-II) in outpatients. J Pers Disord. 1998, 12: 138-148. 10.1521/pedi.1998.12.2.138.

Weertman A, Arntz A, Dreessen L, van Velzen C, Vertommen S: Short-interval test-retest interrater reliability of the Dutch version of the Structured Clinical Interview for DSM-IV personality disorders (SCID-II). J Pers Disord. 2003, 17: 562-567. 10.1521/pedi.17.6.562.25359.

Cicchetti DV, Feinstein AR: High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol. 1990, 43: 551-558. 10.1016/0895-4356(90)90159-M.

Di Eugenio B, Glass M: The Kappa Statistic: A Second Look. Comput Linguist. 2004, 30: 95-101. 10.1162/089120104773633402.

Gwet KL: Handbook of Inter-Rater Reliability. The Definitive Guide to Measuring the Extent of Agreement Among Raters. 2010, Gaithersburg, MD 20886–2696, USA: Advanced Analytics, LLC, 2

Gwet KL: Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008, 61: 29-48. 10.1348/000711006X126600.

Kittirattanapaiboon P, Khamwongpin M: The Validity of the Mini International Neuropsychiatric Interview (M.I.N.I.)-ThaiVersion. Journal of Mental Health of Thailand. 2005, 13: 126-136.

Gwet K: Inter-Rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity. http://www.agreestat.com/research_papers/inter_rater_reliability_dependency.pdf ,

Gwet K: Kappa is not satisfactory for assessing the extent of agreement between raters. http://www.google.ca/url?sa=t&rct=j&q=kappa%20statistic%20is%20not% ,

Day FC, Schriger DL, Annals Of Emergency Medicine Journal Club: A consideration of the measurement and reporting of interrater reliability: answers to the July 2009 Journal Club questions. Ann Emerg Med. 2009, 54: 843-853. 10.1016/j.annemergmed.2009.07.013.

Arntz A, van Beijsterveldt B, Hoekstra R, Hofman A, Eussen M, Sallaerts S: The interrater reliability of a Dutch version of the Structured Clinical Interview for DSM-III-R Personality Disorders. Acta Psychiatr Scand. 1992, 85: 394-400. 10.1111/j.1600-0447.1992.tb10326.x.

Lobbestael J, Leurgans M, Arntz A: Inter-rater reliability of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID I) and Axis II Disorders (SCID II). Clin Psychol Psychother. 2011, 18: 75-79. 10.1002/cpp.693.

Kongerslev M, Moran P, Bo S, Simonsen E: Screening for personality disorder in incarcerated adolescent boys: preliminary validation of an adolescent version of the standardised assessment of personality - abbreviated scale (SAPAS-AV). BMC Psychiatry. 2012, 12: 94-10.1186/1471-244X-12-94.

Chan YH: Biostatistics 104: correlational analysis. Singapore Med J. 2003, 44: 614-619.

Hartling L, Bond K, Santaguida PL, Viswanathan M, Dryden DM: Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy. J Clin Epidemiol. 2011, 64: 861-871. 10.1016/j.jclinepi.2011.01.010.

Hernaez R, Lazo M, Bonekamp S, Kamel I, Brancati FL, Guallar E, Clark JM: Diagnostic accuracy and reliability of ultrasonography for the detection of fatty liver: a meta-analysis. Hepatology. 2011, 54: 1082-1090.

Sheehan DV, Sheehan KH, Shytle RD, Janavs J, Bannon Y, Rogers JE, Milo KM, Stock SL, Wilkinson B: Reliability and validity of the Mini International Neuropsychiatric Interview for Children and Adolescents (MINI-KID). J Clin Psychiatry. 2010, 71: 313-326. 10.4088/JCP.09m05305whi.

Ingenhoven TJ, Duivenvoorden HJ, Brogtrop J, Lindenborn A, van den Brink W, Passchier J: Interrater reliability for Kernberg's structural interview for assessing personality organization. J Pers Disord. 2009, 23: 528-534. 10.1521/pedi.2009.23.5.528.

Øiesvold T, Nivison M, Hansen V, Sørgaard KW, Østensen L, Skre I: Classification of bipolar disorder in psychiatric hospital. A prospective cohort study. BMC Psychiatry. 2012, 12: 13-

Clement S, Brohan E, Jeffery D, Henderson C, Hatch SL, Thornicroft G: Development and psychometric properties the Barriers to Access to Care Evaluation scale (BACE) related to people with mental ill health. BMC Psychiatry. 2012, 12: 36-10.1186/1471-244X-12-36.

McCoul ED, Smith TL, Mace JC, Anand VK, Senior BA, Hwang PH, Stankiewicz JA, Tabaee A: Interrater agreement of nasal endoscopy in patients with a prior history of endoscopic sinus surgery. Int Forum Allergy Rhinol. 2012, 2: 453-459. 10.1002/alr.21058.

Ansari NN, Naghdi S, Forogh B, Hasson S, Atashband M, Lashgari E: Development of the Persian version of the Modified Modified Ashworth Scale: translation, adaptation, and examination of interrater and intrarater reliability in patients with poststroke elbow flexor spasticity. Disabil Rehabil. 2012, 34: 1843-1847. 10.3109/09638288.2012.665133.

Gisev N, Bell JS, Chen TF: Interrater agreement and interrater reliability: Key concepts, approaches, and applications. Res Social Adm Pharm. In press,

Petzold A, Altintas A, Andreoni L, Bartos A, Berthele A, Blankenstein MA, Buee L, Castellazzi M, Cepok S, Comabella M: Neurofilament ELISA validation. J Immunol Methods. 2010, 352: 23-31. 10.1016/j.jim.2009.09.014.

Yusuff KB, Tayo F: Frequency, types and severity of medication use-related problems among medical outpatients in Nigeria. Int J Clin Pharm. 2011, 33: 558-564. 10.1007/s11096-011-9508-z.