I am interested in conducting genetic analyses using the sleep heart health study (SHHS) data and would appreciate recommendations about where to begin.
I see that there is a dbGaP page for SHHS data, but as far as I can tell it is only available for those recruited from Framingham (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000137.v10.p5).
I am hoping to derive my outcome variables from the raw SHHS polysomnography data available.
Here are my questions:
Thanks for inquiring and for your interest in NSRR/dbGaP data. I am going to ping a couple others on the NSRR team who are more familiar with dbGaP and ask them to comment if they have additional information.
Here's my best attempt at answering your questions:
Good luck, please let us know if we can assist further!
Genotype data are available for three parent cohorts in SHHS: ARIC, CHS, and FHS. No joint genotyping data are available, but some consortia data (e.g. publicly released versions of CHARGE WES/WGS and TOPMed WGS) may be more amenable to merging (though these too will be split at a dbGaP study level).
Depending on your goals, it may make sense to work with summary data already deposited in dbGaP. For PSG, your best options are:
ARIC: pht004228.v2.p1 (uc6453)
CHS: pht003699.v2.p1 (SHHS1_PSG)
FHS: pht000395.v9.p11 (sleep1_1998s) [Offspring only, no genotypes for the Omni cohort]
Other datasets also exist (e.g. SHHS2 PSG, questionnaire data, etc.) You can search for the pht number (e.g. 'pht000395') from the main dbGaP page for more information on particular datasets.
If these variables don't suit your needs, then you can use ID translations that have been set up by the NSRR team working with dbGaP and the parent cohorts (I'm not personally involved in this). This is still in progress, and there's no guarantee that links will be in place for all cohorts as the parent studies need to sign off. This has been coordinated with dbGaP, so there won't be any compromising study ID issues. A general introduction is here: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/document.cgi?study_id=phs000287.v6.p1&phd=5259 .
For CHS, you can use pht005388 (nsrr_chs_id_link2).
For FHS, you can use pht007767 (nsrrid_fhsdbgapid_link_shareid). Note here that SHARe IDs are a secondary dbGaP ID set.
You'd then need to work within dbGaP's ID structure. There are subject-sample mapping files that link a single participant ID to one or more sample IDs (e.g. FHS may have different sample IDs for Affy 500k, Omni 5M, etc. for the same person's unique dbGaP sample ID).
We may have similar goals (I've run multiple GWAS on these and other datasets). If you're interested in potentially collaborating, I'd be happy to discuss offline.