Inter-rater agreement in sleep stage classification between centers with different backgrounds
Tóm tắt
To investigate inter-rater agree-
ment between scorers from three
centers with clinical (Marburg University,
UMA) or research (German
Aerospace Center, DLR and
Dortmund University, UDO) backgrounds.
Additionally, sleep scoring
rules of the new AASM manual
for the scoring of sleep and associated
events were reviewed regarding
possible implications for inter-rater agreement. Each of three centers contributed
20 nights. All 60 nights (37
subjects, 9 female, mean age ± sd
= 41.8 ± 16.1 years) were scored
by each center according to the
rules of Rechtschaffen and Kales.
Twenty subjects underwent obstructive
sleep apnea (OSA) diagnosis,
the remaining subjects
participated in studies on the effects
of traffic noise on sleep and
were free of intrinsic sleep disorders. According to kappa
statistics, inter-rater agreement
between the three centers was excellent
in 38 %, fair to good in 62 %
and never poor. Mean kappa values
decreased in the order rapid
eye movement sleep, wake, stage 2,
slow wave sleep, stage 4, stage 1 and
stage 3. Time spent in the different
sleep stages was positively correlated
with kappa values. Pairwise
comparisons revealed that agreement
on stage 1 was significantly
worse for UDO, but concerning all
other stages none of the centers deviated
significantly from the other
two. Analyses of Venn diagrams
showed tendencies of UDO for
scoring wake alone and of UMA for
scoring stage 4 alone. Differences between clinical and
research centers were overall minor.
Pairwise kappa comparisons
of several centers/scorers as well
as Venn diagrams may detect systematic
deviances of single centers/
scorers that consequently should
receive additional training. The revised
AASM rules for sleep scoring
will most likely increase inter-rater
agreement, but future studies will
have to prove this.
Tài liệu tham khảo
Basner M, Isermann U, Samel A (2006) Aircraft noise effects on sleep: Application of the results of a large polysomnographic field study. J Acoust Soc Am 119(5):2772–2784
Basner M, Samel A (2005) Effects of nocturnal aircraft noise on sleep structure. Somnologie 9(2):84–95
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 70:213–220
Collop NA (2002) Scoring variability between polysomnography technologists in different sleep laboratories. Sleep Med 3(1):43–47
Danker-Hopfe H, Herrmann W (2001) Interrater-Reliabilität visueller Schlafstadienklassifikation nach Rechtschaffen- und Kales-Regeln: Review und methodische Erwägungen. Klin Neurophysiol 32(2):89–99
Danker-Hopfe H, Kunz D, Gruber G, Klosch G, Lorenzo JL, Himanen SL, Kemp B, Penzel T, Roschke J, Dorn H, Schlogl A, Trenker E, Dorffner G (2004) Interrater reliability between scorers from eight European sleep laboratories in subjects with different sleep disorders. J Sleep Res 13(1):63–69
Ferri R, Ferri P, Colognola RM, Petrella MA, Musumeci SA, Bergonzi P (1989) Comparison between the results of an automatic and a visual scoring of sleep EEG recordings. Sleep 12(4):354–362
Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378–392
Griefahn B, Marks A, Robens S (2006) Noise emitted from road, rail and air traffic and their effects on sleep. J Sound Vib 295(1–2):129–140
Iber C, Ancoli-Israel S, Chesson A, Quan SF (2007) The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications, 1st, American Academy of Sleep Medicine, Westchester, Illinois
Kim Y, Kurachi M, Horita M, Matsuura K, Kamikawa Y (1993) Agreement of visual scoring of sleep stages among many laboratories in Japan: effect of a supplementary definition of slow wave on scoring of slow wave sleep. Japanese Journal of Psychiatry and Neurology 47(1):91–97
Kubicki S, Holler L, Berg I, Pastelak- Price C, Dorow R (1989) Sleep EEG evaluation: a comparison of results obtained by visual scoring and automatic analysis with the Oxford sleep stager. Sleep 12(2):140–149
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Norman RG, Pal I, Stewart C, Walsleben JA, Rapoport DM (2000) Interobserver agreement among sleep scorers from different centers in a large dataset. Sleep 23(7):901–908
Penzel T, Behler PG, von Buttlar M, Conradt R, Meier M, Möller A, Danker-Hopfe H (2003) Reliability of visual evaluation of sleep stages according to Rechtschaffen and Kales from eight polysomnographs by nine sleep centers. Somnologie 7(2):49–58
Perneger TV (1998) What’s wrong with Bonferroni adjustments. BMJ 316 (7139):1236–1238
Rechtschaffen A, Kales A, Berger RJ, Dement WC, Jacobsen A, Johnson LC, Jouvet M, Monroe LJ, Oswald I, Roffwarg HP, Roth B, Walter RD (1968) A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. Public Health Service, U.S. Government, Printing Office, Washington, D.C.
Schaltenbrand N, Lengelle R, Toussaint M, Luthringer R, Carelli G, Jacqmin A, Lainey E, Muzet A, Macher JP (1996) Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep 19(1):26–35
Silber MH, Ancoli-Israel S, Bonnet MH, Chokroverty S, Grigg-Damberger MM, Hirshkowitz M, Kapen S, Keenan SA, Kryger MH, Penzel T, Pressman MR, Iber C (2007) The visual scoring of sleep in adults. J Clin Sleep Med 3(2):121–131
Whitney CW, Gottlieb DJ, Redline S, Norman RG, Dodge RR, Shahar E, Surovec S, Nieto FJ (1998) Reliability of scoring respiratory disturbance indices and sleep staging. Sleep 21(7):749–757