We use cookies and other tools to enhance your experience on our website and to analyze our web traffic.
For more information about these cookies and the data collected, please refer to our Privacy Policy.

Harmonizing sleep and circadian data: why is it important?

Guest Blogger: Diego R. Mazzotti, Ph.D. University of Kansas Medical Center


In this post, we will discuss some of the highlights of a Workshop Report recently published in SLEEP1 by our colleagues at the Sleep Research Network (SRN), a Task Force from the Sleep Research Society (SRS). This report summarizes a discussion panel held at the World Sleep Congress in Vancouver, Canada in 2019, that brought together leaders in sleep, circadian sciences, and biomedical informatics. As such, the multidisciplinary nature of the workshop contributed to a lively discussion about some of the challenges and potential solutions related to harmonizing sleep and circadian data towards improving clinical research supporting large scale, multi-centric clinical trials and observational studies using real-world data.

What is data harmonization?

There are many ways to observe the world we live in and, not surprisingly, it is very common that different groups of scientists come up with different definitions for the same observable phenomena. Moreover, even when there is an agreement on definitions, there might be variation on how some of these concepts are represented, i.e., converted into ‘data’. Data harmonization is the process of curating existing sources of data so that they could be integrated with minimal loss of information. Data harmonization contributes to generating knowledge by combining different data sources in a more generalizable setting.

Why is data harmonization in the context of clinical research important?

We generate data at unprecedent scale. In the context of clinical research, we want to make good use of these data to help our patients and to understand how intriguing the relationship between sleep, circadian rhythms and health is. However, the most impactful studies are those that promote a positive effect on everyone in our society. While being valuable and relevant, small single-center clinical trials often do not represent a patient population in its entirety. This drives the scientific community to pursue collaborative efforts and to design multi-centric studies. This, however, comes with a challenge – more often than not, different research groups and institutions may not have the same protocols for certain study activities. Or they may use different information technology systems (e.g., electronic health records) to store their data. Thus, it is easy to anticipate that the design of multi-centric studies can become largely expensive. Sometimes, even a relatively simple task of identifying eligible participants for a clinical trial can be daunting across different sites. Often, identification of eligibility criteria that depends on detailed diagnostic confirmation (e.g., moderate-severe obstructive sleep apnea without predominant central events) can be time-consuming and doing it manually may be the only feasible way. When systems that can represent key data elements in our sleep and circadian domain exist, such task could be automated and accomplished more efficiently and at a lower cost. When different institutions use harmonized systems to represent their clinical data, this task becomes even easier. Thus, one of the goals of data harmonization is to facilitate how data can be represented, so they become easy to find, access, interchange and reutilize. These terms, often referred as FAIR (findable, accessible, interoperable, and reusable)2 help guide the development of data harmonization efforts which ultimately contribute to more efficient knowledge generation.

Questionnaires, actigraphy, polysomnography and the many sources of sleep and circadian data

In sleep and circadian sciences, many complex and heterogenous sources of data exist. During the workshop, experts in questionnaires, actigraphy and polysomnography (PSG) - some of the most prevalent methods to collect data in our domain – presented their views on the state-of-the-art data representation methods and discussed the challenges as we move towards integrating disparate data sources. For example, while questionnaires are useful to collect patient-reported outcome, sometimes they lack contextual information that may invalidate the accuracy of data representation when removed from the validated context. Self-reported “good sleep quality” may be perceived differently by young or older adults. Therefore, colleting contextual information, such as whether respondents are working or retired may become relevant. Actigraphy is another important method with low subject burden and ability to use over multiple nights. This have contributed to its use for estimating several sleep and circadian traits of interest. Perhaps one of the greatest challenges in actigraphy data harmonization is the conventional reliance on proprietary software. However, the increasing availability of consumer wearables and open-source software is driving efforts to make the data generation process more transparent, allowing data from different studies to be integrated in a meaningful way. Future efforts to integrate these data into electronic health records (EHR) performed under FAIR principles could revolutionize how sleep and circadian traits are assessed as part of regular clinical care. Due to the digital nature of polysomnography (PSG), data representation protocols already exist, such as the European Data Format (EDF; https://www.edfplus.info/). This makes PSG ahead of the curve in terms of data harmonization. The full potential of PSG-derived physiological signals became apparent due to recent advances in signal processing and machine learning methods3. However, many challenges still exist, such as slightly different technical specifications (e.g., sampling rate), lack of channel name conventions, and variability in annotation formats and terminology to represent events (e.g., arousals, apneas, arrhythmias). Fortunately, the development of open-source tools, such as the ** luna** software package (http://zzz.bwh.harvard.edu/luna/), developed by the team behind the National Sleep Research Resource (NSRR), has contributed to advances in large scale processing and integration of signal data across heterogeneous studies. This page (https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/harm-principles.md) is a great resource for those interested in learning how the NSRR is applying important data harmonization principles to facilitate PSG data integration.

Creating a favorable sleep and circadian data ecosystem

To harness the full potential of standardized data harmonization practices, data that has been harmonized also need to be accessible and interoperable. The NSRR is the leading example of this potential, by aggregating and harmonizing data from several observational cohorts and clinical trials. Yet, this only represents one part of our sleep and circadian data ecosystem. Other data sources, such as clinical data obtained from the EHR and personal health data generated by consumer wearables offer unprecedent opportunities in our field. Clinical research networks enabled by institutions such as the National Patient-Centered Clinical Research Network (PCORnet; https://pcornet.org/) and the Observational Health Data Sciences and Informatics (OHDSI; https://www.ohdsi.org/) have been aggregating clinical EHR data for many years. However, due to the lack of comprehensive representation of sleep and circadian data within the terminology systems used by these efforts4, these data may not be readily useable. Minimizing the gap between clinical data generation (e.g., in a sleep laboratory) and clinical data representation (e.g., having terminology systems that contains sleep and circadian terms) is essential to incorporate EHR-based clinical sleep data into our data ecosystem. It is critical for the sleep and circadian biology communities to work together with biomedical informaticists ensuring that structured language about sleep and circadian traits is well represented into data-enabled clinical research networks.

A suggested roadmap and the role of the Sleep Research Network

The workshop also suggested a roadmap to facilitate harmonization and adoption of standardized practices for both research and clinical data. These include the following action items:

  • Establish processes to facilitate the acquisition of standardized data, including technical and methodological specifications and detailed metadata
  • Encourage researchers to use well-documented and open data dictionaries, ideally mapped to controlled clinical terminologies
  • Improve and maintain clinical terminologies so that high-quality sleep and circadian data can be well represented both in clinical and research contexts
  • Encourage vendors of sleep technologies to establish transparent protocols for data representation, processing and sharing, including access to raw signal data to allow effective validation against other methods
  • Encourage tool developers to provide opensource “research use only” versions of their algorithms, which could then be assessed and validated in larger datasets
  • Create frameworks to facilitate incorporation of new types of sleep data into the EHR, such as continuous positive airway pressure adherence and wearables data
  • Encourage national and international societies to provide guidance and education to the community regarding data sharing and associated protocols
  • Encourage funding agencies to support technological development of standardized process to data sharing and harmonization, including incentives and objective evaluation of data sharing quality The Sleep Research Network is working towards fulfilling these steps, in collaboration with the NSRR.

Get involved!

Are you interested in learning how to incorporate sleep and circadian data harmonization principles in your studies? Do you participate in a clinical research network that collects sleep and circadian data and would like to have your data interoperable with other existing resources? Get in touch with the Sleep Research Network via coordinator@srsnet.org to learn more!

You can also hear Dr. Mazzotti speak about Data Harmonization on the Sleep Research Society Podcast here: Link to SRS Podcast (#5): Sleep and Circadian Informatics Data Harmonization - A workshop report from the Sleep Research Society and Sleep Research Network


  1. Mazzotti, D. R. et al. Sleep and Circadian Informatics Data Harmonization: A Workshop Report from the Sleep Research Society and Sleep Research Network. Sleep, doi:10.1093/sleep/zsac002 (2022).
  2. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018, doi:10.1038/sdata.2016.18 (2016).
  3. Lim, D. C. et al. Reinventing polysomnography in the age of precision medicine. Sleep Medicine Reviews 52, doi:10.1016/j.smrv.2020.101313 (2020).
  4. Mazzotti, D. R. Landscape of biomedical informatics standards and terminologies for clinical sleep medicine research: A systematic review. Sleep Med Rev 60, 101529, doi:10.1016/j.smrv.2021.101529 (2021).
By ksparks on October 3, 2022 Oct 3, 2022 in Guest Blogger
no comments
· sorted by
Write a Reply