In this blog, we'll be exploring the recent changes made to the way dataset variables are displayed on the NSRR.
Our variable pages are designed to give you a quick and comprehensive glance at all dataset elements available in a dataset. We see these pages as a guide and reference for you, and hope that they serve you well as you start your research and exploration of the datasets.
Each variable has an overview page, that covers most of the information you will need to know about the variable itself. The following is a screenshot of the rdi3p variable in the Sleep Heart Health Study (SHHS) dataset.
The screenshot highlights the information we have available for specific variables:
identifier- Identifiers are used to link records within a dataset. Occasionally these identifiers also help link with other publicly available datasets on different websites.
choices- Choices are used for variables that have a domain or finite set of options.
integer- Integers are used for continuous variables that are part of the discrete set of integers.
numeric- Numeric are used for continuous variables that are represented as doubles or floats.
string- String fields are used for short fields of words typically under 255 characters.
text- Text fields are used for longer fields of words that often times span multiple lines.
date- Dates are defined by a year, month, and date, typically in
time- Times are defined by an hour, minute, and second, typically in
datetime- Date times are defined by a date, time, and timezone.
file- Files are represented as a string reference to an associated binary or text file.
Certain variables, like gender, also have a finite domain that lists all possible choices and values for a specific variable.
In the case of
gender, the domain consists of two choices,
1: Male and
2: Female, where the choices represent the value followed by the name of the choice in
value: name format.
For each dataset, the dataset documentation team select three or four key variables that should be used to generate graphs for all variables. In SHHS, the documentation team chose
Race. Note that different datasets may have a different set of core variable graphs.
Below is a screenshot of our
rdi3p variable broken down by
Variables that are captured on questionnaires often have an associated form in PDF format. For these variables, we provide a direct link to a blank version of the form. For example, the ess_s1 variable has a menu item that lists all the PDFs on which the variable is found.
The NSRR also generates a list of variables that may be related to the variable you are currently viewing, based off of common labels and keywords, descriptions, and calculation fields. These related variables are there to encourage further exploration of the dataset as opposed to signifying a correlation between the variables.
While maintaining and curating datasets on the NSRR, the dataset teams kept track of strange variable outliers discovered by the Spout outlier check, and noted these extreme values in a Known Issues file. During our late 2015 meeting with our early adopters, it became apparent that the known issues weren't readily viewable in the datasets. To better highlight variables that may require more attention when used in research, we decided to provide a message directly on the variable page if it had a known issue. Below you can see that avg23bpd_s2 has a known issue directly from the variable's overview page. Clicking on the Known Issues link, a detailed description of the issue is presented to you.
AVG23BPD_S2 Overview Known Issues
The final section you will see when viewing a variable is the History tab. This tab is perhaps the most important as it allows you to see how (and if) the variable has been changed or modified as we've released updated datasets, and it allows you to see what the variable looked like in older versions. The NSRR values reproducibility very highly, and hopes that the history feature will give a good insight into researchers using specific versions of the dataset.