Reliability of the PEDro Scale for Rating Quality of Randomized Controlled Trials

Physical Therapy - Tập 83 Số 8 - Trang 713-721 - 2003
Christopher G. Maher1, Catherine Sherrington2, Rob Herbert3, Anne M. Moseley4, Mark R. Elkins5
1CG Maher, PT, PhD, is Associate Professor, School of Physiotherapy, Faculty of Health Sciences, The University of Sydney, PO Box 170, Lidcombe, New South Wales 1825, Australia
2C Sherrington, PT, PhD, is Research Officer, Prince of Wales Medical Research Institute, University of New South Wales, Sydney, New South Wales, Australia
3RD Herbert, PT, PhD, is Senior Lecturer, School of Physiotherapy, The University of Sydney
4AM Moseley, PT, PhD, is Lecturer, Rehabilitation Studies Unit, Department of Medicine, The University of Sydney
5M Elkins, PT, M-HSc, is Research Physiotherapist, Department of Respiratory Medicine, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia

Tóm tắt

Abstract

Background and Purpose. Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. However, the reliability of data obtained with most quality assessment scales has not been established. This report describes 2 studies designed to investigate the reliability of data obtained with the Physiotherapy Evidence Database (PEDro) scale developed to rate the quality of RCTs evaluating physical therapist interventions. Method. In the first study, 11 raters independently rated 25 RCTs randomly selected from the PEDro database. In the second study, 2 raters rated 120 RCTs randomly selected from the PEDro database, and disagreements were resolved by a third rater; this generated a set of individual rater and consensus ratings. The process was repeated by independent raters to create a second set of individual and consensus ratings. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]). Results. The kappa value for each of the 11 items ranged from .36 to .80 for individual assessors and from .50 to .79 for consensus ratings generated by groups of 2 or 3 raters. The ICC for the total score was .56 (95% confidence interval=.47–.65) for ratings by individuals, and the ICC for consensus ratings was .68 (95% confidence interval=.57–.76). Discussion and Conclusion. The reliability of ratings of PEDro scale items varied from “fair” to “substantial,” and the reliability of the total PEDro score was “fair” to “good.”

Từ khóa


Tài liệu tham khảo

2000, How to Use the Evidence: Assessment and Application of Scientific Evidence

Moher, 1999, Improving the quality of reports of randomised controlled trials: the QUORUM statement, Lancet, 354, 1896, 10.1016/S0140-6736(99)04149-5

Verhagen, 2001, The art of quality assessment of RCTs included in systematic reviews, J Clin Epidemiol, 54, 651, 10.1016/S0895-4356(00)00360-7

Schulz, 1995, Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials, JAMA, 273, 408, 10.1001/jama.1995.03520290060030

Egger, 2003, How important are comprehensive literature searches and the assessment of trial quality in systematic reviews, Health Technol Assess, 7, 1, 10.3310/hta7010

Moher, 1998, Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses, Lancet, 352, 609, 10.1016/S0140-6736(98)01085-X

Herbert, 2002, Effects of stretching before and after exercising on muscle soreness and risk of injury: systematic review, BMJ, 325, 468, 10.1136/bmj.325.7362.468

van Tulder, 1999, The effectiveness of acupuncture in the management of acute and chronic low back pain: a systematic review within the framework of the Cochrane Collaboration Back Review Group, Spine, 24, 1113, 10.1097/00007632-199906010-00011

van Poppel, 1998, Lumbar supports and education for the prevention of low back pain in industry: a randomized controlled trial, JAMA, 279, 1789, 10.1001/jama.279.22.1789

Berghmans, 1998, Conservative treatment of stress urinary incontinence in women: a systematic review of randomized clinical trials, Br J Urol, 82, 181, 10.1046/j.1464-410X.1998.00730.x

Juni, 1999, The hazards of scoring the quality of clinical trials for meta-analysis, JAMA, 282, 1054, 10.1001/jama.282.11.1054

Colle, 2002, Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain, Arch Phys Med Rehabil, 83, 1745, 10.1053/apmr.2002.35657

van Tulder, 2003, Exercise therapy for low back pain, The Cochrane Library

Moher, 2001, The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials, Lancet, 357, 1191, 10.1016/S0140-6736(00)04337-3

2001, Phys Ther, 81, 1629

Bhandari, 2001, Quality in the reporting of randomized trials in surgery: is the Jadad scale reliable, Control Clin Trials, 22, 687, 10.1016/S0197-2456(01)00147-7

Clark, 1999, Assessing the quality of randomized trials: reliability of the Jadad scale, Control Clin Trials, 20, 448, 10.1016/S0197-2456(99)00026-4

Oremus, 2001, Interrater reliability of the modified Jadad quality scale for systematic reviews of Alzheimer's disease drug trials, Dement Geriatr Cogn Disord, 12, 232, 10.1159/000051263

Jadad, 1996, Assessing the quality of reports of randomized clinical trials: is blinding necessary, Control Clin Trials, 17, 1, 10.1016/0197-2456(95)00134-4

Sherrington, 2000, PEDro: a database of randomised trials and systematic reviews in physiotherapy, Man Ther, 5, 223, 10.1054/math.2000.0372

Ferreira, 2002, Does spinal manipulative therapy help people with chronic low back pain, Australian Journal of Physiotherapy, 48, 277, 10.1016/S0004-9514(14)60167-7

Maher, 2000, A systematic review of workplace interventions to prevent low back pain, Australian Journal of Physiotherapy, 46, 259, 10.1016/S0004-9514(14)60287-7

Verhagen, 1998, The Delphi List: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus, J Clin Epidemiol, 51, 1235, 10.1016/S0895-4356(98)00131-0

Kunz, 1998, The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials, BMJ, 317, 1185, 10.1136/bmj.317.7167.1185

Moseley, 2002, Evidence for physiotherapy practice: a survey of the Physiotherapy Evidence Database (PEDro), Australian Journal of Physiotherapy, 48, 43, 10.1016/S0004-9514(14)60281-6

Doull, 1931, The effect of irradiation with ultra-violet light on the frequency of attacks of upper respiratory disease (common colds), Am J Hyg, 13, 460

Landis, 1977, The measurement of observer agreement for categorical data, Biometrics, 33, 159, 10.2307/2529310

Fleiss, 1986, The Design and Analysis of Clinical Experiments

Berard, 2000, Reliability of Chalmers' scale to assess quality in meta-analyses on pharmacological treatments for osteoporosis, Ann Epidemiol, 10, 498, 10.1016/S1047-2797(00)00069-7

Verhagen, 1998, Balneotherapy and quality assessment: interobserver reliability of the Maastricht criteria list for blinded quality assessment, J Clin Epidemiol, 51, 335, 10.1016/S0895-4356(97)00297-7

Spitznagel, 1985, A proposed solution to the base rate problem in the Kappa statistic, Arch Gen Psychiatry, 42, 725, 10.1001/archpsyc.1985.01790300093012

Shrout, 1987, Quantification of agreement in psychiatric diagnosis revisited, Arch Gen Psychiatry, 44, 172, 10.1001/archpsyc.1987.01800140084013