SHHS1- Consistency with publications

Philip Terrill


I'm developing and evaluating a number of measurements of OSA related pathology; and comparing them to standard measurements (i.e. AHI) in their ability to predict long-term outcomes (in this case, all-cause mortality). As such, I'm attempting to replicate elements of previously published analyses.

I have a couple of questions to tie up a few loose ends:

  1. Sample size of available data for SHHS1:

The samples size for SHHS1 was approximately 6400. However, the data available through NSSR has approximately 5800. Was this because one of the parent cohorts didn't have data sharing permission built into the original consent?

  1. What are the "primary" variables coding prevalent cardiovascular disease at baseline?

Similar information appears to be coded in variables from different sourcers. In particular, there are the cardiovascular history variables from: (i) the parent cohorts (i.e. prev_mi, prev_stk, etc.) - However, approximately 20% of patients have missing data for these variables (presumably from whole parent cohorts). (ii) The questionnaires from patient recruitment to SHHS (i.e. MI15, STROKE15, etc.).

The latter is appealing because data is available for almost all the patients; but I'm not sure if these were considered the primary information about cardiovascular disease history in the original study design. Ultimately, it is probably most important that the variables I use are consistent with previous literature. Do you know which ones match the variables used in key publications (Particularly Punjabi et. al., Plos Med., 2009, and Redline et al., AJRCCM, 2010)?

  1. Which AHI?

Similarly, which AHI variable was used by these key publications? My guess based on methods sections of papers is ahi_a0h4, but it would be great to confirm if possible.

Cheers and thanks,

Phil Terrill

mrueschman


Great questions. Thanks for taking such care in your handling and analysis of the NSRR data. I will look into the variable questions and get back to you next week.

As for the sample sizes, the difference of ~600 (6,400 -> 5,800) arose from the removal of participants from the Strong Heart Study cohort. We include this note in the dataset introduction:

Note: Due to sovereignty issues, Strong Heart Study participants are not included in the shared SHHS data. Data from a total of 5804 participants (1915 ARIC, 1230 CHS, 688 Framingham Offspring and 1971 from other studies) consenting to share data are available.

Thanks again for using the site!

mrueschman

I cannot speak with absolute certainty since I wasn't involved in those analyses, but here is my take on your other questions:

Prevalent CVD at Baseline

I believe the variables form the SHHS1 Health Interview Form (e.g. STROKE15 and MI15) are the ones that have typically been used to gauge prevalence at the baseline visit. That said, from Punjabi 2009 it looks like they might have used a combination of the self-report medical history and the adjudicated outcomes data (variables in the "CVD Outcomes" folder).

Prevalent cardiovascular disease was defined as history of physician-diagnosed angina, heart failure, myocardial infarction, stroke, and coronary revascularization, and was determined by adjudicated surveillance data provided by the parent cohorts or by self-report at enrollment.

AHI Variable of Interest

'ahi_a0h4' certainly will have been used regularly since it's a pretty standard AHI definition. I noted in Redline 2010 that "OAHI" is referenced, and we have a variable named OAHI in the dataset. The paper doesn't mention central apneas, so I'm guessing 'OAHI' was used. 'OAHI' mirrors with the more recently created 'ahi_o0h4'. (Slight sample size differences due to filtering on PSG signal quality in 'OAHI' at SHHS1.)

Hope this is helpful. Please keep us up to date on your progress!

Philip Terrill

Thanks for that.

I'll do some sensitivity analysis to make sure that there is consistency between similar variables anyway.

cheers, Phil

