NSRR staff
Boston, MA
0000-0002-0506-8368
Top Topics
Recent Topics
Matt: Sure, we can plan to do that for consistency's sake in MrOS/SOF in our next releases. Thanks for raising the issue.
Yep, these are missing codes set by the UCSF group. These were originally SAS missing codes (i.e. .G, .H), which come out as characters in the CSV exports. G and H correspond to values that were scrubbed at the low and high extremes. Unfortunately, we won't be able to get the ages of these subjects.
We will clarify in the next version of the data dictionary. Thanks!
Winda,
There hasn't been any movement on my end toward requesting the SHHS diet/activity data from the individual cohorts.
The topic of soliciting outside data sources for deposition on sleepdata.org was discussed at the user group meeting in October. We agreed to go this route, with our test case coming from Dr. Peppard in Wisconsin. I think 2016 will see us hammering out and refining this process, at which point we will likely reach out to other groups to try and bring them on as collaborators/sharers.
If you wanted to explore obtaining these data on your own, I think BioLINCC would be a good place to start. You can find most of the cohorts that took part in SHHS there, e.g.
https://biolincc.nhlbi.nih.gov/studies/chs/?q=chs https://biolincc.nhlbi.nih.gov/studies/aric/?q=aric https://biolincc.nhlbi.nih.gov/studies/framcohort/?q=framingham
Nice find -- thanks!
Matt,
I am 95% certain the PSG measurements (i.e. EDF/XML and dataset we have on NSRR) correspond with Visit 8.
I took a glance at 'v8sleep.zip' and 'v9sleep.zip' from here: http://sof.ucsf.edu/interface/DataDoc.asp. The "additional sleep measures" in V9 are derived from a night of Masimo Oximeter data, which were also read and scored in our Reading Center. The variables in the V8 sleep dataset align with the variables we generate for our full PSG datasets.
Thanks for checking out the site!
I can confirm that the epochs in the SHHS sleep staging annotations correspond to 30-second windows. This is also correct for other datasets that we currently have EDF and XML annotation files posted for.
For SHHS, I found mention of using the 30-second windows for scoring deep within one of the manuals: https://www.sleepdata.org/datasets/shhs/pages/mop/6-610-mop-overview-of-scoring.md
The 'shhs1' dataset does contain 5,804 records (corresponding to 5,804 overnight sleep studies), but only 5,793 usable EDFs were retrieved from the SHHS archives for posting on sleepdata.org. Unfortunately, those other 11 EDF records were lost over time, most likely due to data corruption at some point long ago.
Nothing comes to mind for me for your request, but I have reached out to some of our signal experts to see if they have any ideas. Thanks for checking out the NSRR!
Alexander,
Thanks for raising this issue -- it is an important one. There is a bit of documentation missing that would have helped you understand the missingness in cai4p and oahi. These variables have been filtered and many values have been censored from the dataset. The bigger issue is that we don't have documentation on sleepdata.org that describes the filters that have been applied and to which variables. For SHHS, we are mostly in the dark because the original (filtered) analytic datasets were generated 20 years ago and I have not come across the data processing code to know exactly what was done. The task of reverse engineering all the filters and making them known somehow has been on my backburner for awhile now.
cai4p
oahi
Based on prior experience, I made an educated guess that cai4p was filtered by chestqual (quality of chest signal) and abdoqual (quality of abdomen signal), and this seems to be correct. The signal quality variables in SHHS1 run from 1 (lowest) to 4 (highest), and some quick tinkering led me to this formula:
chestqual
abdoqual
if chstqual in (3,4) and abdoqual in (3,4) then cai4p_new = 60 * ( carbp4 + carop4 + canbp4 + canop4 ) / slpprdp;
cai4p_new then has 4,406 valid values and 1,398 missing values, like the cai4p variable you are working with.
cai4p_new
These filters were applied with the mindset of only retaining AHI values where the corresponding scoring signals (e.g. effort channels for indices of central sleep apnea) were of good or better quality. I will work with my colleagues here to try to prioritize writing some documentation that describes this (currently) "hidden" filtering and/or reverse engineering some of these filters and presenting the filtering code alongside the calculation.
Thanks for checking out the site and bringing this topic to the forum!
I am not familiar with any datasets like that. I will ask a couple people here who know more about our PSG archives to comment if they have any insight.