Initial Assessment of SHHS-2 Scorer Reliablilty (reviewed by PSG Committee 4/01)
The following summarizes part of our quality assurance and certification procedures for scorer reliability. In early 1995, a formal scoring reliability study was designed (Whitney et al.; Sleep) which included scoring of 20 studies per original scorer (912 {part time scorer}, 914, 915), scored twice over a 6.5 month interval. Half of these records (10/scorer) were also scored at the second time point by the other scorers. A random sample of these records were rescored over the course of SHHS to document drift over time. It should be noted that these records all came from the first 500 SHHS 1 studies, and overall, were of poorer quality (especially EEG and oximetry) than subsequent records. Note that the only scorer who originally participated in the scoring reliability study who is scoring for SHHS2 is 915. However, scorer 916 joined SHHS1 very early in that study (towards the end of the reliability study initiation) and has been involved in SHHS since then. Scorer 922 joined SHHS2 in November 2000.
Time points: 1: Jan-April 1996 2: Oct 1996-Jan 1997 3: June 2000 4: Jan 2001
Intra Class Correlation Coefficient by Time Point
June 2000 (n=10) | January 2001 (n=10) | |
---|---|---|
Scorer ID | 914,915,916 | 915,916,922 |
RDI | .97 | .99 |
AI | .69 | .75 |
% Stage 1 | .71 | .70 |
% Stage 2 | .90 | .92 |
% Stage 34 | .93 | .94 |
% Rem | .93 | .88 |
Total Sleep Time | .99 | .97 |
Intra Rater Reliability for Each Scorer Across Time
912 | 914 | 915 | 916 | |
---|---|---|---|---|
Subjects, n | 9 | 4 | 10 | 10 |
Time points | 2 | 2 | 3* | 2 |
RDI | .96 | .99 | .97 | .99 |
AI | .70 | .72 | .77 | .75 |
% Stage 1 | .52 | .78 | .87 | .75 |
% Stage 2 | .72 | .78 | .85 | .94 |
% Stage 34 | .87 | .95 | .86 | .98 |
% Rem | .91 | .91 | .96 | .90 |
Total Sleep Time | .96 | .98 | .99 | .99 |
* Only 4 subjects at Time 2 |
June 2000: Scorers 914, 915, 916 participated in a specific exercise designed to evaluate arousal reliability in a contemporary data set. 1040 epochs were selected from 40 records. Each record segment was scored twice by each scorer, within 2 weeks of original scoring. (Note: YY means that that scorer identified an arousal on given epoch at both time points; YN means she identified the arousal first time but not second time, etc.)
Table 1: Joint Classification of Arousal Reliability Data between Raters 914 and 915
Rater 915 YY | YN | NY | NN | Total | |
---|---|---|---|---|---|
Rater 914 YY | 106 | 8 | 8 | 12 | 134 |
YN | 7 | 1 | 3 | 11 | 22 |
NY | 15 | 3 | 1 | 20 | 39 |
NN | 21 | 18 | 9 | 797 | 845 |
Total | 149 | 30 | 21 | 840 | 1040 |
Table 2: Joint Classification of Arousal Reliability Data between Raters 914 and 916
Rater 916 YY | YN | NY | NN | Total | |
---|---|---|---|---|---|
Rater 914 YY | 109 | 8 | 5 | 12 | 134 |
YN | 8 | 2 | 2 | 10 | 22 |
NY | 13 | 4 | 3 | 19 | 39 |
NN | 16 | 13 | 6 | 810 | 845 |
Total | 146 | 27 | 16 | 851 | 1040 |
Table 3: Joint Classification of Arousal Reliability Data between Raters 915 and 916
Rater 916 YY | YN | NY | NN | Total | |
---|---|---|---|---|---|
Rater 915 YY | 118 | 10 | 6 | 15 | 149 |
YN | 7 | 8 | 2 | 13 | 31 |
NY | 9 | 2 | 2 | 8 | 21 |
NN | 12 | 7 | 6 | 815 | 845 |
Total | 146 | 27 | 16 | 851 | 1040 |
Table 4: Estimates of Intra- and Inter-Rater Agreement Measured by Cohen's Kappa
Intra-Rater Agreement
Rater 914 | Rater 915 | Rater 916 |
---|---|---|
0.78 (0.72, 0.82)* | 0.82 (0.77, 0.86) | 0.85 (0.80, 0.89) |
Inter-Rater Agreement
Raters 914/915 | Raters 914/916 | Raters 915/916 |
---|---|---|
0.70 (0.65, 0.75) | 0.73 (0.68,0.78) | 0.76 (0.72, 0.80) |
* 95% confidence intervals