The XML annotation files contain timestamped markers for apneas and hypopneas. For SHHS Visit 1, see here - https://sleepdata.org/datasets/shhs/files/polysomnography/annotations-events-profusion/shhs1
I received a response from the analyst. Posted below.
Does this have to do with artefacts?
Why did you decide exactly for 0.35, 2.5, 180 and 1000? Are these set empirically by you or based on a paper?
They were set empirically.
These thresholds were set empirically with the objective of removing artifacts.
Even though automated QRS annotations were corrected as appropriate by a trained technician, a residual number of beats could have been incorrectly annotated. NN intervals < 0.35 s are artifact because they fall on the refractory period of the heart. There is also a very high chance that NN intervals > 2.5 s (heart rate less than 24 bpm) are misdetections rather then long pauses. These thresholds were meant to exclude misdetections and mislabeled beats.
“180”: 180 beats in 5 minutes corresponds to a heart rate of 36 bpm. If a 5-min window did not have at least 180 beats most likely that was due to artifact and/or the presence of non-sinus beats. For some perspective, the population median [25th and 75th percentiles] of the average NN interval was 939 [859 – 1033] ms, which in terms of heart rate corresponds to 63.9 [58 – 69.8] bpm.
Note that HRV is a technique that only applies to NN interval time series. Furthermore, there is no consensus on how to perform frequency analysis of “discontinuous” (due to the deletion of non-sinus beats) time series. Thus we wanted to limit the number of these windows.
“1000”: Participants with less than 1000 NN intervals (~15 mins) over the full night were immediately excluded. Either they were not in sinus rhythm or signal quality was an issue. Criteria for analyzing participants with at least 2h of combined N1, N2, N3-N4, REM was based on the idea that if we wanted to compare HRV for different sleep stages we needed a minimum amount of data. In addition, for the generation of full night (from sleep onset to sleep termination) summary statistics we wanted to avoid putting together those who spend most of the sleep period awake with those who slept “much more”.
Thanks for your inquiry. I reached out to the leader of the MESA HRV analysis with some of your questions. I will post again when I hear back.
Thanks for your inquiry. Your course of action is exactly what I would recommend. We believe the respiratory event annotations and timestamps are generally correct.
The "ahi_a0h3a" variable was computed from a set of component variables that were output soon after the original scoring of the study (i.e. between 1995-2005 for SHHS1/SHHS2). These component variables should be more intact than the XML annotation exports, which were done many years after in a different version of the Profusion software. Hence, we have a component variable that tells us "# of hypopneas with >=4% desaturation", instead of having to try and recompute this tally from the underlying XML file, which we know for certain has degraded SpO2 information, so to speak.
I seem to recall discussion about the SpO2 desaturations being shifted 30 seconds, though I can't say for certain. Regardless, our suggestion is to "re-detect" the SpO2 desaturations and align/link them with the scored respiratory events for analyses like the one you describe.
Good luck and stay well!
Thanks for checking out the resource. We have had hundreds of students sign up and receive access to datasets. Most data requests are reviewed and processed within 1-2 weeks of submission.
I see you submitted a data request yesterday. Your request is likely to be returned to you for lack of details in your Specific Purpose. Our goal in reviewing data requests is to adhere to the data sharing language laid out in the original informed consent documents signed by human research subjects. As such, we require incoming users to provide a description (typically 1 or 2 paragraphs will suffice) of what they are going to do with the data, e.g. what variables/topics are of interest, what statistical tests they plan to run, what machine learning methods will be used, etc.
Feel free to ask more questions here as you proceed through the data request process.
This is a challenge, especially in SHHS. There are noteworthy issues with the XML annotation files as described on this documentation page. Particularly, the SpO2 desaturation annotations and linking with respiratory events were affected due to version changes in the Profusion scoring and exporting software over time.
The line "Desaturation" is supposed to tell you the level of the associated desaturation, though this indication will be unreliable in SHHS (due to known issues).
There is another variable in the dataset, "rdi0p", which should give you a total event (i.e. all apneas and all hypopneas) index. The value for subject 200155 is 77.8, which is 166 total events. Some events may not be counted by the scoring software due to starting or ending in wake.
If you want to pursue this further our suggestion is to implement your own desaturation detection algorithm to recreate the SHHS SpO2 desaturation events and linking with respiratory events. We have done some work on this ourselves, though we aren't actively preparing those data for release.
Thanks for checking out the resource.
Hello again - thanks for your patience.
For MESA - would you please re-download and check 2852 again? This study looks OK to us. This study was re-exported/fixed around the time of Brian's original data request, so perhaps you have an old copy.
For the other two MESA studies (1738, 6476) we traced the discrepancy issue back to issues of data loss/corruption with the original scoring files, which caused these unexpected mismatches. We were unable to fix the scoring data at the source to match the data you see in the summary result file (CSV). I will make a note of this issue with the scored data export here: https://sleepdata.org/datasets/mesa/pages/polysomnography-introduction.md
I will look into the CFS issues next.
Thanks to you both. Please have Minsoo submit a data request for MESA , CFS, and whatever other datasets your team is working with. He can use the same sort of language from brianhoonsukbyun's request.
Your findings are not entirely unanticipated. We have encountered such discrepancies ourselves. Right now the NSRR team is undertaking a large-scale effort to review all our datasets for issues exactly like this. We will make corrections whenever possible or otherwise note that the issue exists (and possibly why).
I hope to have a chance to look at a handful of these specifically and report back some initial findings next week.
Thanks for bringing this to our attention. I will explore some of these discrepancies. Can you please answer a couple questions?
I'm glad that helped. Thank you for the kind words. I will share your links with other members of the team and bring it up at our next group meeting.
We are familiar with CDEs and have toyed around a bit with linking NSRR data back to established CDEs. What you describe in your aims for HIV/AIDS sounds very similar to what we have contemplated for studies/trials of sleep.