Thanks for that.
I'll do some sensitivity analysis to make sure that there is consistency between similar variables anyway.
I'm developing and evaluating a number of measurements of OSA related pathology; and comparing them to standard measurements (i.e. AHI) in their ability to predict long-term outcomes (in this case, all-cause mortality). As such, I'm attempting to replicate elements of previously published analyses.
I have a couple of questions to tie up a few loose ends:
The samples size for SHHS1 was approximately 6400. However, the data available through NSSR has approximately 5800. Was this because one of the parent cohorts didn't have data sharing permission built into the original consent?
Similar information appears to be coded in variables from different sourcers. In particular, there are the cardiovascular history variables from:
(i) the parent cohorts (i.e. prev_mi, prev_stk, etc.) - However, approximately 20% of patients have missing data for these variables (presumably from whole parent cohorts).
(ii) The questionnaires from patient recruitment to SHHS (i.e. MI15, STROKE15, etc.).
The latter is appealing because data is available for almost all the patients; but I'm not sure if these were considered the primary information about cardiovascular disease history in the original study design. Ultimately, it is probably most important that the variables I use are consistent with previous literature. Do you know which ones match the variables used in key publications (Particularly Punjabi et. al., Plos Med., 2009, and Redline et al., AJRCCM, 2010)?
Similarly, which AHI variable was used by these key publications? My guess based on methods sections of papers is ahi_a0h4, but it would be great to confirm if possible.
Cheers and thanks,
That seems to match my impression from the data.
I guess the main concern will be whether the averaging time was consistently set across all the patients in the cohort. It might be hard to know now whether this was something that the hard-coded on the compumedics device or whether the collection centres had control over it?
I'm wondering if anyone knows what the averaging time is for the raw SaO2 data recorded in SHHS1?
I've had a look through the technical documentation and key publications, but I haven't had any luck finding this. The sampling rate (1Hz) hints that is may be a 2s averaging time (based on the Nyquist criteria) and the data itself appears to have a fairly quick physiological response; but I know a lot of the old Nonin oximeters were setup with default averaging times of 16 or 32s.
cheers and thanks,
Any idea if these problems also exist in what I presume to be the original annotation files (i.e. shhs1-200001-nsrr.xml)? Being a .xml I'm guessing these are also post-windows conversion, and are therefore likely to have the same problems.
I've already written a fairly simple desaturation scoring algorithm (currently with validation data in infants but not adults); so I'll use this for my pilot analysis and will continue putting some more work into this.
Once I have something that I'm reasonably happy with (and with some reasonable validation in a reliably scored dataset), I'll get in touch about uploading it here.
There will be some down-stream challenges in re-associating desaturations with the manually scored respiratory events in order to classify respiratory events according to current AASM criteria (my understanding from the documentation is that at present, annotated hypopnoea's are scored as any ventilatory disturbance with no desaturation or arousal criteria).
Thanks for such a quick response Mike.
I'm working with the raw edf PSG's and profusion annotations in the sleep heart health study data (SHHS1). We have custom matlab code which then merges this data back together for further analysis.
I note that there seems to be a number of studies where desaturation events have only been scored in the the first half or two thirds of the night (or at least according the the .xml file); However, on examination of the SpO2 data, there are clear desaturations associated with reductions in ventilation in thermistor/RIP channels and scored respiratory events.
I have gone back to the raw .xml files to check that this is not an error in our import/analysis code.
Some example studies include:
However, it seems very widespread in the subset I have been looking at.
Has anyone else run into this problem? I'm wondering if the export function has not worked correctly?