SHHS Dataset v0.3.0 released!

36 posts
Was this reply useful? Learn more...
   
[-] remomueller +0 points · about 3 years ago

We've just released an update to the SHHS dataset. This release is based on the final Johns Hopkins BioLINCC dataset, however we've spent time to clean the dataset of impossible values, and created a list of known issues for variables with questionable values.

The full release notes are available on the SHHS Data Dictionary repository on GitHub.

The updated charts and tables for SHHS here: https://sleepdata.org/datasets/shhs/variables

Special thanks to @mcailler and @michellereid who created and closed over 955 issues in preparing this release for primetime!


0.3.0 Changes

  • Removed in_shhs2_lad variable (i.e. participant was part of SHHS2 Limited Access Dataset) as it is no longer relevant
  • The SAS export now adds race, gender, and age at SHHS1 to each of the CSV datasets
    • Missing codes are now removed by default from all variables in SHHS1 and SHHS2
    • Null values (in the form of zeroes) are now removed from variables where appropriate
  • Valid race domain choices were changed to be White, Black, and Other
  • Valid gender values were updated from characters to numeric values for consistency across other domains
  • Ethnicity has been added as a separate variable, rather than being classified within the race domain
  • Visit number has been re-added to each of the BioLINCC datasets
  • Categorical age has been added to SHHS1 and SHHS2 BioLINCC datasets
    • Categorical age at SHHS1 has been added to the SHHS2 dataset
  • The obfuscated pptid has now been added to all datasets by default as obf_pptid
  • Issues with data from the BioLINCC datasets for v0.3.0 (i.e. extreme values) have been grouped into a Known Issues list
Write a Reply