I have a follow-up on this. As Adam wrote above, there were extra race categories in earlier versions of the SHHS1 dataset. Collapsing the categories seems ok to me, but I'm more worried about the new ethnicity variable in the newer SHHS datasets. This makes it seem as though the dataset was collected with modern NIH conventions of asking about race (Black/White/Asian/Native American/etc.) AND ethnicity (Hisp / Non-Hisp). There were 280 with race = Hisp in 0.2.0, and there are 280 with ethnicity = Hisp in 0.8.0. There were There were 4907 race = white in 0.2.0, and 4907 race = white in 0.8.0. This means that in the latest dataset, all subjects that are coded as ethnicity = Hispanic are also coded as non-white. That is an accurate reflection of the original dataset, but not an accurate representation of how ethnicity is currently defined, where ethnic Hispanics can choose among different races. What was the reason for breaking up the race category into a race and an ethnicity category?
Do you have similar plans to recode ages >89 to 90 in the other data sets? SOF and MrOS still have ages > 89 in the dataset.
There are 6 subjects in the SOF dataset (sof-visit-8-dataset-0.3.0.csv) with ages (variable v8age) coded as "G" or "H." Any idea what this means? I have not yet tried to cross reference to data from SOF Online (http://sof.ucsf.edu/interface/Introduction.asp).
In the online histogram, these show up at n=6 in the age 0-8 bin.
In the data dictionary, there is no indication of non-numeric codes (sof-data-dictionary-0.3.1-variables.csv):
Administrative v8age Age numeric years
Let's bump up your 95% to 100%. I found the visit descriptions here: http://sof.ucsf.edu/Docs/Complete_description_of_each_visit.pdf.
Visit 8 was the PSG:
Plus standard questionnaires:
Sleep patterns Pittsburgh Sleep Quality Index, Epworth
Sleepiness Scale, Functional Outcomes of
Sleep Questionnaire, minutes until asleep,
time wake up/fall asleep, hours per night,
problems associated with, frequency
Visit 9 included the standard questionnaires for everyone, and some specific followups for the sleep subset.
Pittsburgh Sleep Quality Index, Epworth
Sleepiness Scale, minutes until asleep, time
wake up/fall asleep, hours per night,
problems associated with, frequency
ADDITIONAL MEASURES FOR SLEEP/COGNITION VISIT
SOF Details of Measurements (updated June 2013) p 26 of 29
Average 4 nights/5 days actigraphy data Octagonal Motionlogger SleepWatch
(catalog no. 26.100)
Oximetry Masimo Rad8 Signal Extraction Pulse
Sleep Patterns Functional Outcomes of Sleep Questionnaire
Restless Legs Syndrome Questionnaire
Visual Acuity Bailey-Lovie Exam
Thanks for the clarification.
I'm trying to figure out when sleep was measured in the SOF dataset; was it visit 8 or visit 9? On https://sleepdata.org/datasets/sof/pages, it states that “Sleep studies were completed on 461 SOF participants at Visit 8.” The SOF description, however, indicates it was Visit 9, as can be seen on http://sof.ucsf.edu/interface/DataNews.asp, “Visit 9 (Year 20) questionnaire and exam data for entire cohort (additional sleep measures in subset).”
This is important because the SOF online variables are coded by visit number.
I think an interesting question here is whether RDIs determined by different desaturation cutoffs give different predictions for CVD outcomes. As Veda notes, there are many RDIs, from rdi0p to rdi5p, not to mention RDI variables broken down by sleep stage and position. The SDB paper posted by Dennis employs the rdi4p which has been used in other SHHS papers as well over the years.