CFS Dataset v0.1.0 released!

109 posts
Was this reply useful? Learn more...
   
[-] mrueschman +0 points · almost 3 years ago

The first version of the Cleveland Family Study Visit 5 dataset is now available. This dataset accompanies 730 EDF and XML annotation files that are available for more in-depth analysis.

Future dataset iterations for CFS will include more variables, along with data from earlier study visits.

Here's a summary of the release:

0.1.0 Changes

  • Initial import from family3dd.xls
  • All non-calculated variable are associated with forms
  • Redundant identifier variables have been removed from the data dictionary
  • Domains have been created for all variables originally marked as type: choices
  • Fixed several outliers, negative and implausible values
  • Variables have now been associated with forms, where appropriate
  • Demographics variables and key subscales have been marked as 'commonly used'
  • Missing values have been stripped from the dataset
  • Family medical history variables have been removed from this release, pending a more in depth cleaning
  • PHI and identifiable variables have either been obfuscated or removed from the dataset
  • Variables sourced from the baseline_lab_questionnaire form have been updated to match exact questionnaire wording
36 posts
Was this reply useful? Learn more...
   
[-] remomueller +0 points · almost 3 years ago

Special thanks to @mcailler, @kevgleas, and @michellereid on a great job curating and finalizing the CFS dataset and data dictionary! Over 200 issues closed in the creation of this first release!

Write a Reply