Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies

Physiotherapy - Tập 100 - Trang 27-35 - 2014
Shaun O’Leary1,2, Marte Lund1,3, Tore Johan Ytre-Hauge1,4, Sigrid Reiersen Holm1,5, Kaja Naess1,6, Lars Nagelstad Dalland1,7, Steven M. McPhail8,9
1NHMRC Centre for Clinical Research Excellence in Spinal Pain, Injury and Health, University of Queensland, Brisbane, QLD 4072, Australia
2Physiotherapy Department, Royal Brisbane and Women's Hospital, Queensland Health, Herston, Brisbane, QLD 4029, Australia
3Norwegian Sports Medicine Clinic (NIMI), Oslo, Norway
4Medi 3 Clinic, Aalesund, Norway
5University Hospital of Northern Norway, Tromsø, Norway
6Hans & Olaf Physiotherapy Clinic, Oslo, Norway
7Eggedal Physiotherapy Clinic, Sigdal, Norway
8Centre for Functioning and Health Research, Queensland Health, Cnr of Ipswich Road and Cornwall Street, Brisbane, Australia
9School of Public Health and Institute of Health and Biomedical Innovation, Queensland University of Technology, Victoria Park Road, Brisbane, Australia

Tài liệu tham khảo

Fedorak, 2003, Reliability of the visual assessment of cervical and lumbar lordosis: how good are we?, Spine, 28, 1857, 10.1097/01.BRS.0000083281.48923.BD Kibler, 2002, Qualitative clinical evaluation of scapular dysfunction: a reliability study, J Shoulder Elbow Surg, 11, 550, 10.1067/mse.2002.126766 Hickey, 2007, Accuracy and reliability of observational motion analysis in identifying shoulder symptoms, Man Ther, 12, 263, 10.1016/j.math.2006.05.005 McClure, 2009, A clinical method for identifying scapular dyskinesis, part 1: reliability, J Athl Train, 44, 160, 10.4085/1062-6050-44.2.160 Tooth, 2004, The kappa statistic in rehabilitation research: an examination, Arch Phys Med Rehabil, 85, 1371, 10.1016/j.apmr.2003.12.002 Sim, 2005, The kappa statistic in reliability studies: use, interpretation, and sample size requirements, Phys Ther, 85, 257, 10.1093/ptj/85.3.257 Cohen, 1960, A coefficient of agreement for nominal scales, Educ Psychol Meas, 37, 10.1177/001316446002000104 McPhail, 2009, Telephone reliability of the Frenchay Activity Index and EQ-5D amongst older adults, Health Qual Life Outcomes, 7, 48, 10.1186/1477-7525-7-48 McPhail, 2008, Two perspectives of proxy reporting of health-related quality of life using the Euroqol-5D, an investigation of agreement, Med Care, 46, 1140, 10.1097/MLR.0b013e31817d69a6 McPhail, 2012, Patients undergoing subacute rehabilitation have accurate expectations of their health-related quality of life at discharge, Health Qual Life Outcomes, 10, 94, 10.1186/1477-7525-10-94 Landis, 1977, The measurement of observer agreement for categorical data, Biometrics, 33, 159, 10.2307/2529310 Cicchetti, 1990, High agreement but low kappa: II. Resolving the paradoxes, J Clin Epidemiol, 43, 551, 10.1016/0895-4356(90)90159-M Feinstein, 1990, High agreement but low kappa: I. The problems of two paradoxes, J Clin Epidemiol, 43, 543, 10.1016/0895-4356(90)90158-L Byrt, 1993, Bias, prevalence and kappa, J Clin Epidemiol, 46, 423, 10.1016/0895-4356(93)90018-V Hoehler, 2000, Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity, J Clin Epidemiol, 53, 499, 10.1016/S0895-4356(99)00174-2 Fleiss, 1971, Measuring nominal scale agreement among many raters, Psychol Bull, 76, 378, 10.1037/h0031619 O’Toole, 2010, Evaluation of computed tomography for determining the diagnosis of acetabular fractures, J Orthop Trauma, 24, 284, 10.1097/BOT.0b013e3181c83bc0 Behensky, 2002, Multisurgeon assessment of coronal pattern classification systems for adolescent idiopathic scoliosis: reliability and error analysis, Spine (Phila Pa 1976), 27, 762, 10.1097/00007632-200204010-00015 Riddle, 2002, Evaluation of the presence of sacroiliac joint region dysfunction using a combination of tests: a multicenter intertester reliability study, Phys Ther, 82, 772, 10.1093/ptj/82.8.772 Jull, 2008 Sahrmann, 2002 Craven, 2009, Modified Ashworth scale reliability for measurement of lower extremity spasticity among patients with SCI, Spinal Cord, 48, 207, 10.1038/sc.2009.107 Daley, 1999, Reliability of scores on the Stroke Rehabilitation Assessment of Movement (STREAM) measure, Phys Ther, 79, 8, 10.1093/ptj/79.1.8 Sakzewski, 2007, Clinimetric properties of participation measures for 5-to 13-year-old children with cerebral palsy: a systematic review, Dev Med Child Neurol, 49, 232, 10.1111/j.1469-8749.2007.00232.x McPhail, 2012, Intratherapist reliability in the rating of scapula posture in multiple planes of reference, ISRN Rehabil, 2012 Karduna, 2001, Dynamic measurements of three-dimensional scapular kinematics: a validation study, J Biomech Eng, 123, 184, 10.1115/1.1351892 McClure, 2001, Direct 3-dimensional measurement of scapular kinematics during dynamic movements in vivo, J Shoulder Elbow Surg, 10, 269, 10.1067/mse.2001.112954 Sobush, 1996, The Lennie test for measuring scapular position in healthy young adult females: a reliability and validity study, J Orthop Sports Phys Ther, 23, 39, 10.2519/jospt.1996.23.1.39 Nijs, 2007, Clinical assessment of scapular positioning in patients with shoulder pain: state of the art, J Manipulative Physiol Ther, 30, 69, 10.1016/j.jmpt.2006.11.012 Bland, 1986, Statistical methods for assessing agreement between two methods of clinical measurement, Lancet, 1, 307, 10.1016/S0140-6736(86)90837-8