A minor comment/request: would it be possible to harmonize channel names (and units) within a study, across all EDFs?
For example, in the SHHS study, the first EEG channel is always "EEG". The second one is either "EEG(sec)", "EEG2", or "EEG(SEC)". As channels are not necessarily in the same order across EDFs, it is useful to extract channels by their labels. To facilitate automated processing across 1000s of EDFs, ideally labels would be similar (within a study, at least). A similar principle applies to the units -- e.g. CHAT C3 & C4 channels are sometimes uV, sometimes mV for different EDFs.
Beyond these minor issues, I wonder whether it may be desirable to post harmonized EDFs that also have some basic level of artifact correction or flagging of clearly aberrant epochs, etc? i.e. to perform centrally some of the core steps that most subsequent users of the data would otherwise presumably be performing themselves. On the other hand, I can see the value in retaining exact "archival" versions of datasets, warts and all, for other reasons.
Thank you for your comments and request. An update on NSRR activities and some personal comments follow
Harmonizing EDF labels and identifying aberrant epochs are areas that we are working on directly through NSRR efforts or through signal processing research projects.
We have been working on data consistency issues that arise when doing large scale analyses. Tools for identifying inconsistency have been developed and being tested internally. The data consistency checkers generate EXCEL output that can be used to create a batch analysis file which include parameters that are not consistent in a study (ex include signal labels and sampling rate). We are happy to make these tools available to the general community as they become more robust. We are happy to share the code as is for those willing to jump right in with code underdevelopment.
I have shifted my personal development of large scale signal processing application to not require consistent signal labels nor consistent sampling rates. For example, the spectral analysis pipeline automatically converts the signal units to uV prior to analyses. This has allowed us to reduce data harmonization required during cross study analyses with only an incremental upfront development cost. I have found this approach preferable to creating/maintaining copies of multiple cohort studies.
We have begun identifying EEG studies that are not recommended to be used for EEG analyses as part of NSRR EEG spectral analysis activities. A list of studies included and excluded as part of spectral analysis will be posted as cohort spectral analysis results are posted. Individual subject spectral analyses output files includes a flag identifying epochs as artifacts. Individual subject spectral analysis output files could be made available to the community as requested.
Please feel free to email me for additional details or to request code.