We use cookies and other tools to enhance your experience on our website and to analyze our web traffic.
For more information about these cookies and the data collected, please refer to our Privacy Policy.

mrueschman

mrueschman
Joined Oct 2013
Bio

NSRR staff

Boston, MA

0000-0002-0506-8368

mrueschman
Joined Oct 2013
Bio

NSRR staff

Boston, MA

0000-0002-0506-8368

Adam,

Thanks -- good question. You also asked about versioning in your email to support@sleepdata.org, so I am going to post my reply here about that issue and the race issue you note.

The idea behind our versioning is that the most recent version (0.8.0 for SHHS) would be the “latest and greatest” and would be our suggested starting point for new analyses. From 0.3.0 and onward we broke the dataset into separate CSVs per visit, which would explain why 0.2.0 has more observations in its single file than the files that came later. Also around the switch from 0.2.0 to 0.3.0 we received updated data from the dataset owner (Johns Hopkins in this case) that added more cases to our “CVD Outcomes” dataset. We took down 0.3.0 because it contained records for SHHS subjects that did not consent to share data for future research.

Yes, the race data were collapsed into 3 categories by the SHHS dataset owners, which explains the difference between 0.2.0 and 0.4.0+. Our NSRR data mimic what is posted on BioLINCC (https://biolincc.nhlbi.nih.gov/studies/shhs/?q=shhs) – our 0.2.0 version of the data came from a preliminary BioLINCC dataset which did not have the race variable change incorporated yet.

Technically one could look back to the older dataset (possibly merging with a newer version) to get the race variable with more fine-grained categories, but we have not carried these data forward into subsequent releases since this is how the dataset owners have immortalized the dataset on BioLINCC. My best guess is that this change was made to more closely match a quasi-standard of how race is presented in BioLINCC datasets. Most datasets that I have seen from BioLINCC have this Black/White/Other breakdown.

As for your other question about the parent cohorts: There will not be a way to identify the parent cohort of SHHS participants from the NSRR datasets. These links were explicitly removed by the dataset owners as part of the de-identification process when posting on BioLINCC. I believe if you went through BioLINCC to request and obtain access to the parent cohorts (e.g. Framingham, ARIC, etc.) that they may grant access to the linking codes (lookup table with IDs across different data sources).

Hope this helps. Thanks!