Hi,
This's not a qestion, just a small notice. Maybe it will be useful for someone.
I need to choose records in shhs1 with AHI < 5. There isn't AHI parameters in shhs1-dataset-0.7.0.csv, but we can calculate it as AHI = cai4p + oahi.
cai4p - https://sleepdata.org/datasets/shhs/variables/cai4p, oahi - https://sleepdata.org/datasets/shhs/variables/oahi.
However, there are 698 unknown oahi values and 1398 unknown cai4p values. They are defined as follows:
oahi = 60 * ( hrembp4 + hrop4 + hnrbp4 + hnrop4 + oarbp + oarop + oanbp + oanop ) / slpprdp,
cai4p = 60 * ( carbp4 + carop4 + canbp4 + canop4 ) / slpprdp.
So, we can expect that at least one of these variables should be unknown if oahi or cai4p are unknown, but they are not. Thus, we can calculate cai4p, oahi and AHI for every record in shhs1. This is code for it in Python:
import pandas as pd
data = pd.read_csv('shhs1-dataset-0.7.0.csv')
print('Amount of missing values in cai4p', data['cai4p'].isnull().sum())
print('Amount of missing values in oahi', data['oahi'].isnull().sum())
cai4p = 60*data[['CAROP4','CARBP4','CANBP4', 'CANOP4']].sum(1)/data['SlpPrdP']
oahi = 60*data[['HREMBP4','HROP4','HNRBP4', 'HNROP4','OARBP','OAROP','OANBP','OANOP' ]].sum(1)/data['SlpPrdP']
print('Amount of missing values in cai4p', cai4p.isnull().sum())
print('Amount of missing values in oahi', oahi.isnull().sum())
AHI = cai4p + oahi