Thanks for checking out the resource. We have had hundreds of students sign up and receive access to datasets. Most data requests are reviewed and processed within 1-2 weeks of submission.
I see you submitted a data request yesterday. Your request is likely to be returned to you for lack of details in your Specific Purpose. Our goal in reviewing data requests is to adhere to the data sharing language laid out in the original informed consent documents signed by human research subjects. As such, we require incoming users to provide a description (typically 1 or 2 paragraphs will suffice) of what they are going to do with the data, e.g. what variables/topics are of interest, what statistical tests they plan to run, what machine learning methods will be used, etc.
Feel free to ask more questions here as you proceed through the data request process.
This is a challenge, especially in SHHS. There are noteworthy issues with the XML annotation files as described on this documentation page. Particularly, the SpO2 desaturation annotations and linking with respiratory events were affected due to version changes in the Profusion scoring and exporting software over time.
The line "Desaturation" is supposed to tell you the level of the associated desaturation, though this indication will be unreliable in SHHS (due to known issues).
There is another variable in the dataset, "rdi0p", which should give you a total event (i.e. all apneas and all hypopneas) index. The value for subject 200155 is 77.8, which is 166 total events. Some events may not be counted by the scoring software due to starting or ending in wake.
If you want to pursue this further our suggestion is to implement your own desaturation detection algorithm to recreate the SHHS SpO2 desaturation events and linking with respiratory events. We have done some work on this ourselves, though we aren't actively preparing those data for release.
Thanks for checking out the resource.
Hello again - thanks for your patience.
For MESA - would you please re-download and check 2852 again? This study looks OK to us. This study was re-exported/fixed around the time of Brian's original data request, so perhaps you have an old copy.
For the other two MESA studies (1738, 6476) we traced the discrepancy issue back to issues of data loss/corruption with the original scoring files, which caused these unexpected mismatches. We were unable to fix the scoring data at the source to match the data you see in the summary result file (CSV). I will make a note of this issue with the scored data export here: https://sleepdata.org/datasets/mesa/pages/polysomnography-introduction.md
I will look into the CFS issues next.
Thanks to you both. Please have Minsoo submit a data request for MESA , CFS, and whatever other datasets your team is working with. He can use the same sort of language from brianhoonsukbyun's request.
Your findings are not entirely unanticipated. We have encountered such discrepancies ourselves. Right now the NSRR team is undertaking a large-scale effort to review all our datasets for issues exactly like this. We will make corrections whenever possible or otherwise note that the issue exists (and possibly why).
I hope to have a chance to look at a handful of these specifically and report back some initial findings next week.
Thanks for bringing this to our attention. I will explore some of these discrepancies. Can you please answer a couple questions?
I'm glad that helped. Thank you for the kind words. I will share your links with other members of the team and bring it up at our next group meeting.
We are familiar with CDEs and have toyed around a bit with linking NSRR data back to established CDEs. What you describe in your aims for HIV/AIDS sounds very similar to what we have contemplated for studies/trials of sleep.
The data dictionary CSV files are publicly available for each dataset. You'll find them in the Files area under the datasets folder, e.g.
The data dictionary files you see (domains, forms, variables) represent what you get from running the Spout export.
Tags as you describe do not exist, though we do have some data harmonization enhancements in mind for the future.
Thanks for your interest in the site!
I will make a note in the documentation regarding this, but it's possible that only the sleep period was thoroughly reviewed for alignment between the beat detector and the EKG signal. One of our technicians reviewed a couple MESA studies and found that beats very early in the recording (e.g. the first 15 seconds) did not match as consistently as beats further on into the recording. In looking at mesa-0001 it seemed like the very first beat in the R-point file was mismatched, though the following beats looked OK. We were comparing the "seconds" column in the R-point file to our visualization of the raw EKG signal (measuring the time from recording start to the apparent R peak).
Let me know if you come across issues as you dig in further.
The SHHS/MrOS datasets may have some data on Parkinson's diagnosis and medication use, e.g. https://sleepdata.org/search?search=parkinson
Filtering to those subjects may allow you to find some PSG records of patients with Parkinson's.
I think EDFbrowser and Polyman both have EDF header editing capabilities. These tools will also do checks similar to those in the "edfize" tool in regards to whether or not the EDF header is written to the EDF specification.
We have come across plenty of sleep software manufacturers who do not adhere to the EDF specification, which is unfortunate. I assume that is what you are dealing with here. It's easy to see from the quick edfize command you ran that the date is not conforming to the "dd.mm.yy" requirement. Maybe one of the tools will be able to revise the header, rewrite the EDF, and produce something that functions in your other workflows.