Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed
Tài liệu tham khảo
Altaye, 2001, A general goodness-of-fit approach for inference procedures concerning the kappa statistic, Statistics in Medicine, 20, 2479, 10.1002/sim.911
American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1999. Standards for Educational and Psychological Testing. American Educational Research Association, Washington.
Audigé, 2005, A concept for the validation of fracture classifications, Journal of Orthopaedic Trauma, 19, 404, 10.1097/01.bot.0000155310.04886.37
Audigé, 2004, How reliable are reliability studies of fracture classifications?, Acta Orthopaedica Scandinavica, 75, 184, 10.1080/00016470412331294445
Bååth, 2008, Interrater reliability using Modified Norton Scale, Pressure Ulcer Card, Short Form-Mini Nutritional Assessment by registered and enrolled nurses in clinical practice, Journal of Clinical Nursing, 17, 618, 10.1111/j.1365-2702.2007.02131.x
Barlow, 1991, A comparison of methods for calculating a stratified kappa, Statistics in Medicine, 10, 1465, 10.1002/sim.4780100913
Barone, 2006, Should an Allen test be performed before radial artery cannulation?, The Journal of Trauma, 61, 468, 10.1097/01.ta.0000229815.43871.59
Bates-Jensen, 2008, Subepidermal moisture differentiates erythema and stage I pressure ulcers in nursing home residents, Wound Repair and Regeneration, 16, 189, 10.1111/j.1524-475X.2008.00359.x
Beckman, 2004, How reliable are assessments of clinical teaching?, Journal of General Internal Medicine, 19, 971, 10.1111/j.1525-1497.2004.40066.x
Bhat, 2005, Inter-rater reliability of delirium rating scales, Neuroepidemiology, 25, 48, 10.1159/000085440
Bland, 1990, A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement, Computers in Biology and Medicine, 20, 337, 10.1016/0010-4825(90)90013-F
Bland, 1999, Measuring agreement in method comparison studies, Statistical Methods in Medical Research, 8, 135, 10.1191/096228099673819272
Bonnet, 2002, Sample size requirements for estimating intraclass correlations with desired precision, Statistics in Medicine, 21, 1331, 10.1002/sim.1108
Bossuyt, 2003, The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration, Annals of Internal Medicine, 138, W1, 10.7326/0003-4819-138-1-200301070-00012-w1
Bot, 2004, Clinimetric evaluation of shoulder disability questionnaires: a systematic review, Annals of the Rheumatic Diseases, 63, 335, 10.1136/ard.2003.007724
Bours, 1999, The development of a National Registration Form to measure the prevalence of pressure ulcers in the Netherlands, Ostomy Wound Management, 45, 28
Brorson, 2008, Training improves agreement among doctors using the Neer system for proximal humeral fractures in a systematic review, Journal of Clinical Epidemiology, 61, 7, 10.1016/j.jclinepi.2007.04.014
Buntinx, 1996, Inter-observer variation in the assessment of skin ulceration, Journal of Wound Care, 5, 166, 10.12968/jowc.1996.5.4.166
Cantor, 1996, Sample-size calculations for Cohen's kappa, Psychological Methods, 1, 150, 10.1037/1082-989X.1.2.150
Charter, 2003, Combining reliability coefficients: possible application to meta-analysis and reliability generalization, Psychological Reports, 93, 643, 10.2466/pr0.2003.93.3.643
Cicchetti, 1999, Sample size requirements for increasing the precision of reliability estimates: problems and proposed solutions, Journal of Clinical and Experimental Neuropsychology, 21, 567, 10.1076/jcen.21.4.567.886
Cicchetti, 2001, The precision of reliability and validity estimates re-visited: distinguishing between clinical and statistical significance of sample size requirements, Journal of Clinical and Experimental Neuropsychology, 23, 695, 10.1076/jcen.23.5.695.1249
Cicchetti, 2006, Rating scales, scales of measurement, issues of reliability, The Journal of Nervous and Mental Disease, 194, 557, 10.1097/01.nmd.0000230392.83607.c5
Colle, 2002, Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain, Archives of Physical Medicine and Rehabilitation, 83, 1745, 10.1053/apmr.2002.35657
Darroch, 1986, Category distinguishability and observer agreement, Australian & New Zealand Journal of Statistics, 28, 371, 10.1111/j.1467-842X.1986.tb00709.x
De Vet, 2006, When to use agreement versus reliability measures, Journal of Clinical Epidemiology, 59, 1033, 10.1016/j.jclinepi.2005.10.015
De Villiers, 2005, The Delphi technique in health sciences education research, Medical Teacher, 27, 639, 10.1080/13611260500069947
Defloor, 2004, Inter-rater reliability of the EPUAP pressure ulcer classification system using photographs, Journal of Clinical Nursing, 13, 952, 10.1111/j.1365-2702.2004.00974.x
D’Olhaberriague, 1996, A reappraisal of reliability and validity studies in stroke, Stroke, 27, 2331, 10.1161/01.STR.27.12.2331
Donner, 1987, Sample size requirements for reliability studies, Statistics in Medicine, 6, 441, 10.1002/sim.4780060404
Donner, 1992, A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance testing and sample size estimation, Statistics in Medicine, 11, 1511, 10.1002/sim.4780111109
Donner, 1994, Statistical implications of the choice between a dichotomous or continuous trait in studies of interobserver agreement, Biometrics, 50, 550, 10.2307/2533400
Donner, 1996, The statistical analysis of kappa statistics in multiple samples, Journal of Clinical Epidemiology, 49, 1053, 10.1016/0895-4356(96)00057-1
Dunn, 2004
Elkum, 2008, Signal-to-noise ratio (SNR) as a measure of reproducibility: design, estimation, and application, Health Services and Outcomes Research Methodology, 8, 119, 10.1007/s10742-008-0030-2
European Pressure Ulcer Advisory Panel, 2005. EPUAP Statement on Prevalence and Incidence Monitoring of Pressure Ulcer Occurrence 2005. Retrieved March 8, 2009, from http://www.epuap.org/review6_3/page5.html.
Feinstein, 1990, High agreement but low kappa: I. The problems of two paradoxes, Journal of Clinical Epidemiology, 43, 543, 10.1016/0895-4356(90)90158-L
Fink, 1984, Consensus methods: characteristics and guidelines for use, American Journal of Public Health, 74, 979, 10.2105/AJPH.74.9.979
Fleiss, 2003
Gajewski, 2007, Inter-rater reliability of pressure ulcer staging: probit Bayesian Hierarchical Model that allows for uncertain rater response, Statistics in Medicine, 26, 4602, 10.1002/sim.2877
Gardener, 1986, Confidence intervals rather than P values: estimation rather than hypothesis testing, British Medical Journal, 292, 746, 10.1136/bmj.292.6522.746
Giraudeau, 2001, Planning a reproducibility study: how many subjects and how many replicates per subject for an expected width of the 95 percent confidence interval of the intraclass correlation coefficient, Statistics in Medicine, 20, 3205, 10.1002/sim.935
Gjørup, 1988, The kappa coefficient and the prevalence of a diagnosis, Methods of Information in Medicine, 27, 184, 10.1055/s-0038-1635539
Glaser, 1980, Using behavioral science strategies for defining the state-of-the-art, Journal of Applied Behavioral Science, 16, 79, 10.1177/002188638001600107
Gould, 2004, Examining the validity of pressure ulcer risk assessment scales: a replication study, International Journal of Nursing Studies, 41, 331, 10.1016/j.ijnurstu.2003.10.005
Gouttebarge, 2004, Reliability and validity of functional capacity evaluation methods: a systematic review with reference to Blankenship system, Ergos work simulator, Ergo-Kit and Isernhagen work system, International Archives of Occupational and Environmental Health, 77, 527, 10.1007/s00420-004-0549-7
Gwet, 2008, Computing inter-rater reliability and its variance in the presence of high agreement, The British Journal of Mathematical and Statistical Psychology, 61, 29, 10.1348/000711006X126600
Hall, 2008, Intertester reliability and diagnostic validity of the cervical flexion-rotation test, Journal of Manipulative and Physiological Therapy, 31, 293, 10.1016/j.jmpt.2008.03.012
Hart, 2006, Reliability testing of the national database of nursing quality indicators pressure ulcer indicator, Journal of Nursing Care Quality, 21, 256, 10.1097/00001786-200607000-00011
Hestbaek, 2000, Are chiropractic tests for the lumbo-pelvic spine reliable and valid? A systematic review, Journal of Manipulative and Physiological Therapeutics, 23, 258, 10.1016/S0161-4754(00)90173-8
House, 1981, Measures of interobserver agreement: calculation formulas and distribution effects, Journal of Behavioral Assessment, 3, 37, 10.1007/BF01321350
Hutchings, 2006, A comparison of formal consensus methods used for developing clinical guidelines, Journal of Health Services Research & Policy, 11, 218, 10.1258/135581906778476553
Hwang, 2006, Representation of ophthalmology concepts by electronic systems: intercoder agreement among physicians using controlled terminologies, Ophtalmology, 113, 511, 10.1016/j.ophtha.2006.01.017
Innes, 1999, Reliability of work-related assessments, Work, 13, 107
Kadam, 2006, A comparison of two consensus methods for classifying morbidities in a single professional group showed the same outcomes, Journal of Clinical Epidemiology, 59, 1169, 10.1016/j.jclinepi.2006.02.016
Kirshner, 1985, A methodological framework for assessing health indices, Journal of Chronic Diseases, 38, 27, 10.1016/0021-9681(85)90005-0
Kobak, 2004, Rater training in multicenter clinical trials: issues and recommendations, Journal of Clinical Psychopharmacology, 24, 113, 10.1097/01.jcp.0000116651.91923.54
Kobak, 2005, A new approach to rater training and certification in a multicenter clinical trial, Journal of Clinical Psychopharmacology, 25, 407, 10.1097/01.jcp.0000177666.35016.a0
Kobak, 2008, A comparison of face-to-face and remote assessment of inter-rater reliability on the Hamilton Depression Rating Scale via videoconferencing, Psychiatry Research, 158, 99, 10.1016/j.psychres.2007.06.025
Kottner, 2008, Interpreting interrater reliability coefficients of the Braden scale: a discussion paper, International Journal of Nursing Studies, 45, 1239, 10.1016/j.ijnurstu.2007.08.001
Kottner, 2008, An interrater reliability study of the Braden scale in two nursing homes, International Journal of Nursing Studies, 45, 1501, 10.1016/j.ijnurstu.2008.02.007
Kottner, 2009, Inter- and intrarater reliability of the Waterlow pressure sore risk scale: a systematic review, International Journal of Nursing Studies, 46, 369, 10.1016/j.ijnurstu.2008.09.010
Kottner, 2009, A systematic review of interrater reliability of pressure ulcer classification systems, Journal of Clinical Nursing, 18, 315, 10.1111/j.1365-2702.2008.02569.x
Kraemer, 1979, Ramifications of a population model for κ as a coefficient of reliability, Psychometrika, 44, 461, 10.1007/BF02296208
Kraemer, 1992, Measurement of reliability for categorical data in medical research, Statistical Methods in Medical Research, 1, 183, 10.1177/096228029200100204
Kraemer, 2002, Kappa coefficients in medical research, Statistics in Medicine, 21, 2109, 10.1002/sim.1180
Landis, 1977, The measurement of observer agreement for categorical data, Biometrics, 33, 159, 10.2307/2529310
Lee, 1989, Statistical evaluation of agreement between two methods for measuring a quantitative variable, Computers in Biology and Medicine, 19, 61, 10.1016/0010-4825(89)90036-X
Lewicki, 2000, Sensitivity and specificity of the Braden scale in the cardiac surgical population, Journal of Wound Ostomy and Continence Nursing, 27, 36
Maclure, 1987, Misinterpretation and misuse of the kappa statistic, American Journal of Epidemiology, 126, 161, 10.1093/aje/126.2.161
McAlister, 1999, Why we need large, simple studies of clinical examination: the problem and a proposed solution, Lancet, 354, 1721, 10.1016/S0140-6736(99)01174-5
McGraw, 1996, Forming inferences about some intraclass correlation coefficients, Psychological Methods, 1, 30, 10.1037/1082-989X.1.1.30
Mok, 2008, Comparison of observer variation in conventional and three digital radiographic methods used in the evaluation of patients with adolescent idiopathic scoliosis, Spine, 33, 681, 10.1097/BRS.0b013e318166aa8d
Müller, 1994, A critical discussion of intraclass correlation coefficients, Statistics in Medicine, 13, 2465, 10.1002/sim.4780132310
Mulsant, 2002, Interrater reliability in clinical trials of depressive disorders, The American Journal of Psychiatry, 159, 1598, 10.1176/appi.ajp.159.9.1598
Nanda, 2008, An assessment of the inter examiner reliability of clinical tests for subacromial impingement and rotator cuff integrity, European Journal of Orthopaedic Surgery and Traumatology, 18, 495, 10.1007/s00590-008-0341-6
Nunnally, 1994
Nwosu, 1998, Is real-time ultrasonic bladder volume estimation reliable and valid? A systematic overview, Scandinavian Journal of Urology and Nephrology, 32, 325, 10.1080/003655998750015278
Oot-Giromini, 1993, Pressure ulcer prevalence, incidence and associated risk factors in the community, Decubitus, 6, 24
Peat, 2005
Perkins, 2000, Penny-wise and pound-foolish: the impact of measurement error on sample size requirements in clinical trails, Biological Psychiatry, 47, 762, 10.1016/S0006-3223(00)00837-4
Polit, 2008
Ratanawongsa, 2008, The reported validity and reliability of methods for evaluating continuing medical education: a systematic review, Academic Medicine, 83, 274, 10.1097/ACM.0b013e3181637925
Richardson, 1972, Peer review of medical care, Medical Care, 10, 29, 10.1097/00005650-197201000-00004
Roberts, 1998, A matrix of kappa-type coefficients to assess the reliability of nominal scales, Statistics in Medicine, 17, 471, 10.1002/(SICI)1097-0258(19980228)17:4<471::AID-SIM745>3.0.CO;2-N
Roberts, 2005, Assessing the reliability of ordered categorical scales using kappa-type statistics, Statistical Methods in Medical Research, 14, 493, 10.1191/0962280205sm413oa
Rothery, 1979, A nonparametric measure of intraclass correlation, Biometrika, 66, 629, 10.1093/biomet/66.3.629
Rousson, 2002, Assessing intrarater, interrater and test–retest reliability of continuous measurements, Statistics in Medicine, 21, 3431, 10.1002/sim.1253
Sainsbury, 2005, Reliability of the Barthel Index when used with older people, Age Ageing, 34, 228, 10.1093/ageing/afi063
Saito, 2006, Effective number of subjects and number of raters for inter-rater reliability studies, Statistics in Medicine, 25, 1547, 10.1002/sim.2294
Scinto, 2001, The case for comprehensive quality indicator reliability assessment, Journal of Clinical Epidemiology, 54, 1103, 10.1016/S0895-4356(01)00381-X
Shoukri, 2004
Shoukri, 2004, Sample size requirements for the design of reliability study: review and new results, Statistical Methods in Medical Research, 13, 251, 10.1191/0962280204sm365ra
Shrout, 1998, Measurement reliability and agreement in psychiatry, Statistical Methods in Medical Research, 7, 301, 10.1191/096228098672090967
Shrout, 1979, Intraclass correlations: uses in assessing rater reliability, Psychological Bulletin, 86, 420, 10.1037/0033-2909.86.2.420
Slongo, 2006, Journal of Pediatric Orthopedics, 26, 43, 10.1097/01.bpo.0000187989.64021.ml
Steenbeek, 2007, Goal attainment in paediatric rehabilitation: a critical review of the literature, Developmental Medicine & Child Neurology, 49, 550, 10.1111/j.1469-8749.2007.00550.x
Stevens, 1946, On the theory of scales of measurement, Science, 103, 677, 10.1126/science.103.2684.677
Stochkendahl, 2006, Manual examination of the spine: a systematic critical literature review of reproducibility, Journal of Manipulative and Physiological Therapeutics, 29, 475, 10.1016/j.jmpt.2006.06.011
Stratford, 1997, Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data, Physical Therapy, 77, 745, 10.1093/ptj/77.7.745
Streiner, 2003, Clinimetrics versus psychometrics: an unnecessary distinction, Journal of Clinical Epidemiology, 56, 1142, 10.1016/j.jclinepi.2003.08.011
Streiner, 2008
Suen, 1988, Agreement, reliability, accuracy, and validity: toward a clarification, Behavioral Assessment, 10, 343
Sun, 1997, Reliability and validity of clinical outcome measurements of osteoarthritis of the hip and knee – a review of the literature, Clinical Rheumatology, 16, 185, 10.1007/BF02247849
Swingler, 2001, Observer variation in chest radiography of acute lower respiratory infections in children: a systematic review, BMC Medical Imaging, 1, 1, 10.1186/1471-2342-1-1
Szklo, 2007
Terwee, 2007, Quality criteria were proposed for measurement properties of health status questionnaires, Journal of Clinical Epidemiology, 60, 34, 10.1016/j.jclinepi.2006.03.012
Thomsen, 2002, Kappa statistics in the assessment of observer variation: the significance of multiple observers classifying ankle fractures, Journal of Orthopaedic Science, 7, 163, 10.1007/s007760200028
Topf, 1988, Interrater reliability decline under covert assessment, Nursing Research, 37, 47, 10.1097/00006199-198801000-00010
Uebersax, J., 2002. Raw Agreement Indices. Retrieved April 1, 2008, from http://ourworld.compuserve.com.homepages/jsuebersax/raw.htm.
Vach, 2005, The dependence of Cohen's kappa on the prevalence does not matter, Journal of Clinical Epidemiology, 58, 655, 10.1016/j.jclinepi.2004.02.021
Vacha-Haase, 1998, Reliability generalisation: exploring variance in measurement error affecting score reliability across studies, Educational and Psychological Measurement, 58, 6, 10.1177/0013164498058001002
Vella, 2000, Use of consensus development to establish national priorities in critical care, BMJ, 320, 976, 10.1136/bmj.320.7240.976
Vikström, 2007, Mapping the categories of the Swedish primary health care version of ICD-10 to SNOMED CT concepts: rule development and intercoder reliability in a mapping trial, BMC Medical Informatics and Decision Making, 7, 9, 10.1186/1472-6947-7-9
Walter, 1998, Sample size and optimal designs for reliability studies, Statistics in Medicine, 17, 101, 10.1002/(SICI)1097-0258(19980115)17:1<101::AID-SIM727>3.0.CO;2-E
Whitfield, 1949, Intra-class rank correlation, Biometrika, 36, 463, 10.1093/biomet/36.3-4.463
Wickström, 2000, The “Hawthorne effect”: what did the original Hawthorne studies actually show?, Scandinavian Journal of Work, Environment & Health, 26, 363, 10.5271/sjweh.555
Zegers, 2010, The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record, Journal of Clinical Epidemiology, 63, 94, 10.1016/j.jclinepi.2009.03.004