Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed

International Journal of Nursing Studies - Tập 48 - Trang 661-671 - 2011
Jan Kottner1, Laurent Audige2, Stig Brorson3, Allan Donner4, Byron J. Gajewski5, Asbjørn Hróbjartsson6, Chris Roberts7, Mohamed Shoukri8, David L. Streiner9
1Fanningerstr. 61, 10365 Berlin, Germany
2AO Clinical Investigation and Documentation, Dübendorf, Switzerland
3Department of Orthopaedic Surgery, Herlev University Hospital, Herlev, Denmark
4Department of Epidemiology and Biostatistics, Schulich School of Medicine and Dentistry, The University of Western Ontario, London, Ontario, Canada
5Department of Biostatistics, University of Kansas Schools of Medicine & Nursing, Kansas City, KS, USA
6The Nordic Cochrane Centre, Rigshospitalet, Copenhagen, Denmark
7School of Community Based Medicine, The University of Manchester, Manchester, UK
8Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, The University of Western Ontario, London, Ontario, Canada
9Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada

Tài liệu tham khảo

Altaye, 2001, A general goodness-of-fit approach for inference procedures concerning the kappa statistic, Statistics in Medicine, 20, 2479, 10.1002/sim.911 American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1999. Standards for Educational and Psychological Testing. American Educational Research Association, Washington. Audigé, 2005, A concept for the validation of fracture classifications, Journal of Orthopaedic Trauma, 19, 404, 10.1097/01.bot.0000155310.04886.37 Audigé, 2004, How reliable are reliability studies of fracture classifications?, Acta Orthopaedica Scandinavica, 75, 184, 10.1080/00016470412331294445 Bååth, 2008, Interrater reliability using Modified Norton Scale, Pressure Ulcer Card, Short Form-Mini Nutritional Assessment by registered and enrolled nurses in clinical practice, Journal of Clinical Nursing, 17, 618, 10.1111/j.1365-2702.2007.02131.x Barlow, 1991, A comparison of methods for calculating a stratified kappa, Statistics in Medicine, 10, 1465, 10.1002/sim.4780100913 Barone, 2006, Should an Allen test be performed before radial artery cannulation?, The Journal of Trauma, 61, 468, 10.1097/01.ta.0000229815.43871.59 Bates-Jensen, 2008, Subepidermal moisture differentiates erythema and stage I pressure ulcers in nursing home residents, Wound Repair and Regeneration, 16, 189, 10.1111/j.1524-475X.2008.00359.x Beckman, 2004, How reliable are assessments of clinical teaching?, Journal of General Internal Medicine, 19, 971, 10.1111/j.1525-1497.2004.40066.x Bhat, 2005, Inter-rater reliability of delirium rating scales, Neuroepidemiology, 25, 48, 10.1159/000085440 Bland, 1990, A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement, Computers in Biology and Medicine, 20, 337, 10.1016/0010-4825(90)90013-F Bland, 1999, Measuring agreement in method comparison studies, Statistical Methods in Medical Research, 8, 135, 10.1191/096228099673819272 Bonnet, 2002, Sample size requirements for estimating intraclass correlations with desired precision, Statistics in Medicine, 21, 1331, 10.1002/sim.1108 Bossuyt, 2003, The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration, Annals of Internal Medicine, 138, W1, 10.7326/0003-4819-138-1-200301070-00012-w1 Bot, 2004, Clinimetric evaluation of shoulder disability questionnaires: a systematic review, Annals of the Rheumatic Diseases, 63, 335, 10.1136/ard.2003.007724 Bours, 1999, The development of a National Registration Form to measure the prevalence of pressure ulcers in the Netherlands, Ostomy Wound Management, 45, 28 Brorson, 2008, Training improves agreement among doctors using the Neer system for proximal humeral fractures in a systematic review, Journal of Clinical Epidemiology, 61, 7, 10.1016/j.jclinepi.2007.04.014 Buntinx, 1996, Inter-observer variation in the assessment of skin ulceration, Journal of Wound Care, 5, 166, 10.12968/jowc.1996.5.4.166 Cantor, 1996, Sample-size calculations for Cohen's kappa, Psychological Methods, 1, 150, 10.1037/1082-989X.1.2.150 Charter, 2003, Combining reliability coefficients: possible application to meta-analysis and reliability generalization, Psychological Reports, 93, 643, 10.2466/pr0.2003.93.3.643 Cicchetti, 1999, Sample size requirements for increasing the precision of reliability estimates: problems and proposed solutions, Journal of Clinical and Experimental Neuropsychology, 21, 567, 10.1076/jcen.21.4.567.886 Cicchetti, 2001, The precision of reliability and validity estimates re-visited: distinguishing between clinical and statistical significance of sample size requirements, Journal of Clinical and Experimental Neuropsychology, 23, 695, 10.1076/jcen.23.5.695.1249 Cicchetti, 2006, Rating scales, scales of measurement, issues of reliability, The Journal of Nervous and Mental Disease, 194, 557, 10.1097/01.nmd.0000230392.83607.c5 Colle, 2002, Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain, Archives of Physical Medicine and Rehabilitation, 83, 1745, 10.1053/apmr.2002.35657 Darroch, 1986, Category distinguishability and observer agreement, Australian & New Zealand Journal of Statistics, 28, 371, 10.1111/j.1467-842X.1986.tb00709.x De Vet, 2006, When to use agreement versus reliability measures, Journal of Clinical Epidemiology, 59, 1033, 10.1016/j.jclinepi.2005.10.015 De Villiers, 2005, The Delphi technique in health sciences education research, Medical Teacher, 27, 639, 10.1080/13611260500069947 Defloor, 2004, Inter-rater reliability of the EPUAP pressure ulcer classification system using photographs, Journal of Clinical Nursing, 13, 952, 10.1111/j.1365-2702.2004.00974.x D’Olhaberriague, 1996, A reappraisal of reliability and validity studies in stroke, Stroke, 27, 2331, 10.1161/01.STR.27.12.2331 Donner, 1987, Sample size requirements for reliability studies, Statistics in Medicine, 6, 441, 10.1002/sim.4780060404 Donner, 1992, A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance testing and sample size estimation, Statistics in Medicine, 11, 1511, 10.1002/sim.4780111109 Donner, 1994, Statistical implications of the choice between a dichotomous or continuous trait in studies of interobserver agreement, Biometrics, 50, 550, 10.2307/2533400 Donner, 1996, The statistical analysis of kappa statistics in multiple samples, Journal of Clinical Epidemiology, 49, 1053, 10.1016/0895-4356(96)00057-1 Dunn, 2004 Elkum, 2008, Signal-to-noise ratio (SNR) as a measure of reproducibility: design, estimation, and application, Health Services and Outcomes Research Methodology, 8, 119, 10.1007/s10742-008-0030-2 European Pressure Ulcer Advisory Panel, 2005. EPUAP Statement on Prevalence and Incidence Monitoring of Pressure Ulcer Occurrence 2005. Retrieved March 8, 2009, from http://www.epuap.org/review6_3/page5.html. Feinstein, 1990, High agreement but low kappa: I. The problems of two paradoxes, Journal of Clinical Epidemiology, 43, 543, 10.1016/0895-4356(90)90158-L Fink, 1984, Consensus methods: characteristics and guidelines for use, American Journal of Public Health, 74, 979, 10.2105/AJPH.74.9.979 Fleiss, 2003 Gajewski, 2007, Inter-rater reliability of pressure ulcer staging: probit Bayesian Hierarchical Model that allows for uncertain rater response, Statistics in Medicine, 26, 4602, 10.1002/sim.2877 Gardener, 1986, Confidence intervals rather than P values: estimation rather than hypothesis testing, British Medical Journal, 292, 746, 10.1136/bmj.292.6522.746 Giraudeau, 2001, Planning a reproducibility study: how many subjects and how many replicates per subject for an expected width of the 95 percent confidence interval of the intraclass correlation coefficient, Statistics in Medicine, 20, 3205, 10.1002/sim.935 Gjørup, 1988, The kappa coefficient and the prevalence of a diagnosis, Methods of Information in Medicine, 27, 184, 10.1055/s-0038-1635539 Glaser, 1980, Using behavioral science strategies for defining the state-of-the-art, Journal of Applied Behavioral Science, 16, 79, 10.1177/002188638001600107 Gould, 2004, Examining the validity of pressure ulcer risk assessment scales: a replication study, International Journal of Nursing Studies, 41, 331, 10.1016/j.ijnurstu.2003.10.005 Gouttebarge, 2004, Reliability and validity of functional capacity evaluation methods: a systematic review with reference to Blankenship system, Ergos work simulator, Ergo-Kit and Isernhagen work system, International Archives of Occupational and Environmental Health, 77, 527, 10.1007/s00420-004-0549-7 Gwet, 2008, Computing inter-rater reliability and its variance in the presence of high agreement, The British Journal of Mathematical and Statistical Psychology, 61, 29, 10.1348/000711006X126600 Hall, 2008, Intertester reliability and diagnostic validity of the cervical flexion-rotation test, Journal of Manipulative and Physiological Therapy, 31, 293, 10.1016/j.jmpt.2008.03.012 Hart, 2006, Reliability testing of the national database of nursing quality indicators pressure ulcer indicator, Journal of Nursing Care Quality, 21, 256, 10.1097/00001786-200607000-00011 Hestbaek, 2000, Are chiropractic tests for the lumbo-pelvic spine reliable and valid? A systematic review, Journal of Manipulative and Physiological Therapeutics, 23, 258, 10.1016/S0161-4754(00)90173-8 House, 1981, Measures of interobserver agreement: calculation formulas and distribution effects, Journal of Behavioral Assessment, 3, 37, 10.1007/BF01321350 Hutchings, 2006, A comparison of formal consensus methods used for developing clinical guidelines, Journal of Health Services Research & Policy, 11, 218, 10.1258/135581906778476553 Hwang, 2006, Representation of ophthalmology concepts by electronic systems: intercoder agreement among physicians using controlled terminologies, Ophtalmology, 113, 511, 10.1016/j.ophtha.2006.01.017 Innes, 1999, Reliability of work-related assessments, Work, 13, 107 Kadam, 2006, A comparison of two consensus methods for classifying morbidities in a single professional group showed the same outcomes, Journal of Clinical Epidemiology, 59, 1169, 10.1016/j.jclinepi.2006.02.016 Kirshner, 1985, A methodological framework for assessing health indices, Journal of Chronic Diseases, 38, 27, 10.1016/0021-9681(85)90005-0 Kobak, 2004, Rater training in multicenter clinical trials: issues and recommendations, Journal of Clinical Psychopharmacology, 24, 113, 10.1097/01.jcp.0000116651.91923.54 Kobak, 2005, A new approach to rater training and certification in a multicenter clinical trial, Journal of Clinical Psychopharmacology, 25, 407, 10.1097/01.jcp.0000177666.35016.a0 Kobak, 2008, A comparison of face-to-face and remote assessment of inter-rater reliability on the Hamilton Depression Rating Scale via videoconferencing, Psychiatry Research, 158, 99, 10.1016/j.psychres.2007.06.025 Kottner, 2008, Interpreting interrater reliability coefficients of the Braden scale: a discussion paper, International Journal of Nursing Studies, 45, 1239, 10.1016/j.ijnurstu.2007.08.001 Kottner, 2008, An interrater reliability study of the Braden scale in two nursing homes, International Journal of Nursing Studies, 45, 1501, 10.1016/j.ijnurstu.2008.02.007 Kottner, 2009, Inter- and intrarater reliability of the Waterlow pressure sore risk scale: a systematic review, International Journal of Nursing Studies, 46, 369, 10.1016/j.ijnurstu.2008.09.010 Kottner, 2009, A systematic review of interrater reliability of pressure ulcer classification systems, Journal of Clinical Nursing, 18, 315, 10.1111/j.1365-2702.2008.02569.x Kraemer, 1979, Ramifications of a population model for κ as a coefficient of reliability, Psychometrika, 44, 461, 10.1007/BF02296208 Kraemer, 1992, Measurement of reliability for categorical data in medical research, Statistical Methods in Medical Research, 1, 183, 10.1177/096228029200100204 Kraemer, 2002, Kappa coefficients in medical research, Statistics in Medicine, 21, 2109, 10.1002/sim.1180 Landis, 1977, The measurement of observer agreement for categorical data, Biometrics, 33, 159, 10.2307/2529310 Lee, 1989, Statistical evaluation of agreement between two methods for measuring a quantitative variable, Computers in Biology and Medicine, 19, 61, 10.1016/0010-4825(89)90036-X Lewicki, 2000, Sensitivity and specificity of the Braden scale in the cardiac surgical population, Journal of Wound Ostomy and Continence Nursing, 27, 36 Maclure, 1987, Misinterpretation and misuse of the kappa statistic, American Journal of Epidemiology, 126, 161, 10.1093/aje/126.2.161 McAlister, 1999, Why we need large, simple studies of clinical examination: the problem and a proposed solution, Lancet, 354, 1721, 10.1016/S0140-6736(99)01174-5 McGraw, 1996, Forming inferences about some intraclass correlation coefficients, Psychological Methods, 1, 30, 10.1037/1082-989X.1.1.30 Mok, 2008, Comparison of observer variation in conventional and three digital radiographic methods used in the evaluation of patients with adolescent idiopathic scoliosis, Spine, 33, 681, 10.1097/BRS.0b013e318166aa8d Müller, 1994, A critical discussion of intraclass correlation coefficients, Statistics in Medicine, 13, 2465, 10.1002/sim.4780132310 Mulsant, 2002, Interrater reliability in clinical trials of depressive disorders, The American Journal of Psychiatry, 159, 1598, 10.1176/appi.ajp.159.9.1598 Nanda, 2008, An assessment of the inter examiner reliability of clinical tests for subacromial impingement and rotator cuff integrity, European Journal of Orthopaedic Surgery and Traumatology, 18, 495, 10.1007/s00590-008-0341-6 Nunnally, 1994 Nwosu, 1998, Is real-time ultrasonic bladder volume estimation reliable and valid? A systematic overview, Scandinavian Journal of Urology and Nephrology, 32, 325, 10.1080/003655998750015278 Oot-Giromini, 1993, Pressure ulcer prevalence, incidence and associated risk factors in the community, Decubitus, 6, 24 Peat, 2005 Perkins, 2000, Penny-wise and pound-foolish: the impact of measurement error on sample size requirements in clinical trails, Biological Psychiatry, 47, 762, 10.1016/S0006-3223(00)00837-4 Polit, 2008 Ratanawongsa, 2008, The reported validity and reliability of methods for evaluating continuing medical education: a systematic review, Academic Medicine, 83, 274, 10.1097/ACM.0b013e3181637925 Richardson, 1972, Peer review of medical care, Medical Care, 10, 29, 10.1097/00005650-197201000-00004 Roberts, 1998, A matrix of kappa-type coefficients to assess the reliability of nominal scales, Statistics in Medicine, 17, 471, 10.1002/(SICI)1097-0258(19980228)17:4<471::AID-SIM745>3.0.CO;2-N Roberts, 2005, Assessing the reliability of ordered categorical scales using kappa-type statistics, Statistical Methods in Medical Research, 14, 493, 10.1191/0962280205sm413oa Rothery, 1979, A nonparametric measure of intraclass correlation, Biometrika, 66, 629, 10.1093/biomet/66.3.629 Rousson, 2002, Assessing intrarater, interrater and test–retest reliability of continuous measurements, Statistics in Medicine, 21, 3431, 10.1002/sim.1253 Sainsbury, 2005, Reliability of the Barthel Index when used with older people, Age Ageing, 34, 228, 10.1093/ageing/afi063 Saito, 2006, Effective number of subjects and number of raters for inter-rater reliability studies, Statistics in Medicine, 25, 1547, 10.1002/sim.2294 Scinto, 2001, The case for comprehensive quality indicator reliability assessment, Journal of Clinical Epidemiology, 54, 1103, 10.1016/S0895-4356(01)00381-X Shoukri, 2004 Shoukri, 2004, Sample size requirements for the design of reliability study: review and new results, Statistical Methods in Medical Research, 13, 251, 10.1191/0962280204sm365ra Shrout, 1998, Measurement reliability and agreement in psychiatry, Statistical Methods in Medical Research, 7, 301, 10.1191/096228098672090967 Shrout, 1979, Intraclass correlations: uses in assessing rater reliability, Psychological Bulletin, 86, 420, 10.1037/0033-2909.86.2.420 Slongo, 2006, Journal of Pediatric Orthopedics, 26, 43, 10.1097/01.bpo.0000187989.64021.ml Steenbeek, 2007, Goal attainment in paediatric rehabilitation: a critical review of the literature, Developmental Medicine & Child Neurology, 49, 550, 10.1111/j.1469-8749.2007.00550.x Stevens, 1946, On the theory of scales of measurement, Science, 103, 677, 10.1126/science.103.2684.677 Stochkendahl, 2006, Manual examination of the spine: a systematic critical literature review of reproducibility, Journal of Manipulative and Physiological Therapeutics, 29, 475, 10.1016/j.jmpt.2006.06.011 Stratford, 1997, Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data, Physical Therapy, 77, 745, 10.1093/ptj/77.7.745 Streiner, 2003, Clinimetrics versus psychometrics: an unnecessary distinction, Journal of Clinical Epidemiology, 56, 1142, 10.1016/j.jclinepi.2003.08.011 Streiner, 2008 Suen, 1988, Agreement, reliability, accuracy, and validity: toward a clarification, Behavioral Assessment, 10, 343 Sun, 1997, Reliability and validity of clinical outcome measurements of osteoarthritis of the hip and knee – a review of the literature, Clinical Rheumatology, 16, 185, 10.1007/BF02247849 Swingler, 2001, Observer variation in chest radiography of acute lower respiratory infections in children: a systematic review, BMC Medical Imaging, 1, 1, 10.1186/1471-2342-1-1 Szklo, 2007 Terwee, 2007, Quality criteria were proposed for measurement properties of health status questionnaires, Journal of Clinical Epidemiology, 60, 34, 10.1016/j.jclinepi.2006.03.012 Thomsen, 2002, Kappa statistics in the assessment of observer variation: the significance of multiple observers classifying ankle fractures, Journal of Orthopaedic Science, 7, 163, 10.1007/s007760200028 Topf, 1988, Interrater reliability decline under covert assessment, Nursing Research, 37, 47, 10.1097/00006199-198801000-00010 Uebersax, J., 2002. Raw Agreement Indices. Retrieved April 1, 2008, from http://ourworld.compuserve.com.homepages/jsuebersax/raw.htm. Vach, 2005, The dependence of Cohen's kappa on the prevalence does not matter, Journal of Clinical Epidemiology, 58, 655, 10.1016/j.jclinepi.2004.02.021 Vacha-Haase, 1998, Reliability generalisation: exploring variance in measurement error affecting score reliability across studies, Educational and Psychological Measurement, 58, 6, 10.1177/0013164498058001002 Vella, 2000, Use of consensus development to establish national priorities in critical care, BMJ, 320, 976, 10.1136/bmj.320.7240.976 Vikström, 2007, Mapping the categories of the Swedish primary health care version of ICD-10 to SNOMED CT concepts: rule development and intercoder reliability in a mapping trial, BMC Medical Informatics and Decision Making, 7, 9, 10.1186/1472-6947-7-9 Walter, 1998, Sample size and optimal designs for reliability studies, Statistics in Medicine, 17, 101, 10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>3.0.CO;2-E Whitfield, 1949, Intra-class rank correlation, Biometrika, 36, 463, 10.1093/biomet/36.3-4.463 Wickström, 2000, The “Hawthorne effect”: what did the original Hawthorne studies actually show?, Scandinavian Journal of Work, Environment & Health, 26, 363, 10.5271/sjweh.555 Zegers, 2010, The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record, Journal of Clinical Epidemiology, 63, 94, 10.1016/j.jclinepi.2009.03.004