Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses

Clinical Rehabilitation - Tập 12 Số 3 - Trang 187-199 - 1998
G. Rankin1, María Stokes2
1Royal Hospital for Neuro-disability, London, UK
2Royal Hospital for Neuro-disability, London

Tóm tắt

Objective: To provide a practical guide to appropriate statistical analysis of a reliability study using real-time ultrasound for measuring muscle size as an example. Design: Inter-rater and intra-rater (between-scans and between-days) reliability. Subjects: Ten normal subjects (five male) aged 22–58 years. Method: The cross-sectional area (CSA) of the anterior tibial muscle group was measured using real-time ultrasonography. Main outcome measures: Intraclass correlation coefficients (ICCs) and the 95% confidence interval (CI) for the ICCs, and Bland and Altman method for assessing agreement, which includes calculation of the mean difference between measures ( d), the 95% CI for d, the standard deviation of the differences (SD diff), the 95% limits of agreement and a reliability coefficient. Results: Inter-rater reliability was high, ICC (3,1) was 0.92 with a 95% CI of 0.72 → 0.98. There was reasonable agreement between measures on the Bland and Altman test, as d was -0.63 cm2, the 95% CI for d was -1.4 → 0.14 cm2, the SDdiff was 1.08 cm2, the 95% limits of agreement -2.73 → 1.53 cm2 and the reliability coefficient was 2.4. Between-scans repeatability was high, ICCs (1,1) were 0.94 and 0.93 with 95% CIs of 0.8 → 0.99 and 0.75 → 0.98, for days 1 and 2 respectively. Measures showed good agreement on the Bland and Altman test: d for day 1 was 0.15 cm2 and for day 2 it was -0.32 cm2, the 95% CIs for d were -0.51 → 0.81 cm2 for day 1 and -0.98 → 0.34 cm2 for day 2; SDdiff was 0.93 cm2 for both days, the 95% limits of agreement were -1.71 → 2.01 cm2 for day 1 and -2.18 → 1.54 cm2for day 2; the reliability coefficient was 1.80 for day 1 and 1.88 for day 2. The between-days ICC (1,2) was 0.92 and the 95% CI 0.69 0.98. The d was -0.98 cm2, the SDdiff was 1.25 cm2 with 95% limits of agreement of -3.48 → 1.52 cm2 and the reliability coefficient 2.8. The 95% CI for d(-1.88 → -0.08 cm2) and the distribution graph showed a bias towards a larger measurement on day 2. Conclusions: The ICC and Bland and Altman tests are appropriate for analysis of reliability studies of similar design to that described, but neither test alone provides sufficient information and it is recommended that both are used.

Từ khóa


Tài liệu tham khảo

Streiner DL , Norman GR . Health measurement scales: a practical guide to their development and use, second edition. Oxford: Oxford University Press , 1995: 104–127.

10.2307/2529886

Haas M, 1991, J Manipulative Physiol Ther, 14, 119

10.1136/bmj.304.6840.1491

10.1002/sim.4780090402

Maher C, 1993, Aust J Physiother, 39, 5

10.1093/ptj/69.3.182

10.1179/ptr.1997.2.2.73

10.2466/pr0.1966.19.1.3

10.1037/0033-2909.86.2.420

Fleiss JL, 1986, Design and analysis of clinical experiments, 1

10.1016/S0140-6736(86)90837-8

Bland M, 1987, An introduction to medical statistics, 265

Altman DG, 1991, Practical statistics for medical research, 398

10.1007/BF00233856

10.1177/026921559300700308

10.1093/ptj/74.8.777

10.1093/ptj/69.3.190

Krebs DE, 1984, Phys Ther, 64, 1581

10.1016/S0004-9514(14)60540-7

10.1093/ptj/76.3.248