Release v0.18.0 Overview - Part 2

Posted by remomueller on February 8, 2016 in Releases

In this blog, we'll be exploring the recent changes made to the way dataset variables are displayed on the NSRR.

Variable Updates

Our variable pages are designed to give you a quick and comprehensive glance at all dataset elements available in a dataset. We see these pages as a guide and reference for you, and hope that they serve you well as you start your research and exploration of the datasets.

The Overview

Each variable has an overview page, that covers most of the information you will need to know about the variable itself. The following is a screenshot of the rdi3p variable in the Sleep Heart Health Study (SHHS) dataset.

The screenshot highlights the information we have available for specific variables:

  • Calculation A calculation is provided for any variable that has been derived from other variables in the dataset. Clicking on the variables in the formula leads you to the related component variable.
  • Units Many variables have an associated unit that defines how they were initially captured. Some variables may be captured in the Metric System of Measurement, while others may use the Imperial System of Measurement.
  • Type The NSRR uses type as a generic categorization for variables, and can be one of the following:
    • identifier - Identifiers are used to link records within a dataset. Occasionally these identifiers also help link with other publicly available datasets on different websites.
    • choices - Choices are used for variables that have a domain or finite set of options.
    • integer - Integers are used for continuous variables that are part of the discrete set of integers.
    • numeric - Numeric are used for continuous variables that are represented as doubles or floats.
    • string - String fields are used for short fields of words typically under 255 characters.
    • text - Text fields are used for longer fields of words that often times span multiple lines.
    • date - Dates are defined by a year, month, and date, typically in YYYY-MM-DD format.
    • time - Times are defined by an hour, minute, and second, typically in HH:MM:SS format.
    • datetime - Date times are defined by a date, time, and timezone.
    • file - Files are represented as a string reference to an associated binary or text file.
  • Overview Graph The graph on the overview is a histogram of the variable plotted across visits for datasets with more than one visit.

Certain variables, like gender, also have a finite domain that lists all possible choices and values for a specific variable.

In the case of gender, the domain consists of two choices, 1: Male and 2: Female, where the choices represent the value followed by the name of the choice in value: name format.

Core Variable Graphs

For each dataset, the dataset documentation team select three or four key variables that should be used to generate graphs for all variables. In SHHS, the documentation team chose Age, Gender, and Race. Note that different datasets may have a different set of core variable graphs.

Below is a screenshot of our rdi3p variable broken down by gender.

Variables on Forms

Variables that are captured on questionnaires often have an associated form in PDF format. For these variables, we provide a direct link to a blank version of the form. For example, the ess_s1 variable has a menu item that lists all the PDFs on which the variable is found.

Related Variables

The NSRR also generates a list of variables that may be related to the variable you are currently viewing, based off of common labels and keywords, descriptions, and calculation fields. These related variables are there to encourage further exploration of the dataset as opposed to signifying a correlation between the variables.

Known Issues

While maintaining and curating datasets on the NSRR, the dataset teams kept track of strange variable outliers discovered by the Spout outlier check, and noted these extreme values in a Known Issues file. During our late 2015 meeting with our early adopters, it became apparent that the known issues weren't readily viewable in the datasets. To better highlight variables that may require more attention when used in research, we decided to provide a message directly on the variable page if it had a known issue. Below you can see that avg23bpd_s2 has a known issue directly from the variable's overview page. Clicking on the Known Issues link, a detailed description of the issue is presented to you.

AVG23BPD_S2 Overview

AVG23BPD_S2 Overview Known Issues

Variable History

The final section you will see when viewing a variable is the History tab. This tab is perhaps the most important as it allows you to see how (and if) the variable has been changed or modified as we've released updated datasets, and it allows you to see what the variable looked like in older versions. The NSRR values reproducibility very highly, and hopes that the history feature will give a good insight into researchers using specific versions of the dataset.

We hope you enjoyed reading this blog post, and if you want to provide us feedback or continue the conversation, please reach out to us on our forum or send us an email at support@sleepdata.org.

Author Posts
19
no comments
· sorted by
Write a Reply