The breakdown of the NSRR SHHS dataset, as originally taken from BioLINCC, goes like:
Thanks for checking out the resource!
Thanks for inquiring about what do to as you bring on another user to your project. Our preferred approach is for your colleague to submit a DAUA for themselves. The contents of the DAUA could mimic your own and should mention your upcoming collaboration.
Thanks for checking out the site and SHHS dataset. I think the variation in the number of commas on each given line that you're seeing derives from the "Comm" variable/column. "Comm" is a free text field that includes scorer notes about the overnight sleep study quality. Some of these notes include commas. That said, these field values that contain commas will be contained within double quotes, which most CSV parsers should understand. The dataset reads into Excel, SAS, and R correctly for me.
Example snippet from "Comm":
1,0,0,6,"Lot of alpha-delta sleep but not alpha intrusion. Sleeps on back entire time, only change in position when awake. Airflow choppy at times (-1 hr), chest very small amp (- 1 hr), Low baseline saO2 ~92%, Desats into 70's in REM.",0,8,8,8,8,8,8,
Here's a Stack Overflow post that describes handling commas in a CSV file.
Hope this helps!
Thanks for checking out the resource. The approach you have outlined looks correct to me.
I'll ask the scoring team to comment on the scoring procedures. There is some documentation here: https://sleepdata.org/datasets/shhs/pages/mop/6-00-mop-toc.md
All of these variables (like 'oanba4' or 'oanbp4') are calculated/output from the Compumedics Profusion software. I believe the software links the respiratory events with the arousals itself when spitting out all these numbers.
If we took the difference of 'oanba4' and 'oanbp4' I think we would get a count of "obstructive apneas with arousals but WITHOUT a >= 4% oxygen desaturation (NREM/Supine)". I'm not sure how to get "OAs WITHOUT Arousals, but WITH >= 4% desaturation (NREM/Supine)". These variables will make your head spin!
I think you may have to mine the raw data (EDF/XML) to get at these sorts of things.
What are you trying to do once you parse the XML annotation files?
I don't use python, but I did something in R a couple years ago: https://sleepdata.org/tools/mrueschman-xml-annotation-extractor
Another example in Ruby: https://sleepdata.org/tools/ruby-script-tutorial-05
I asked a couple folks here who know more about the CCSHS EDFs to comment.
Yes! Many participants appear in the Visit 1 and Visit 2 datasets, as well as the CVD Outcomes dataset. This allows you to look at data from the same participants for up to ~15 years of follow-up.
Thanks for exploring our datasets. The Cleveland Family Study did have multiple visits, though data from only one of the visits is currently posted, so there aren't any longitudinal results to explore.
You said "NHHS" but I think you might mean Sleep Heart Health Study. In that case subjects are identifiable between the visits using the "nsrrid" column.
Unfortunately not. We only have the limited set of channels from the Unicorder, so no EEG/staging.