Please fill out our short survey.

Parsing SHHS shhs2-dataset-0.13.0.csv

2 posts
Was this reply useful? Learn more...
martinhemmsen +0 points · 6 months ago


I'm having problems parsing the SHHS file shhs2-dataset-0.13.0.csv. It seems that the number of delimiter characters vary between the header and subjects.

I have used the following Excel code to test the number of delimiter characters per row. =LEN(A1)-LEN(SUBSTITUTE(A1;",";"")) The code shows that the number of delimiter character varies both within header and subjects, but also between subjects. The first 23 rows look like below. Here we can see 1283 for the header and between 1283 and 1287 for the subjects. This number should be consistent. Please advice how to parse the information correctly.

Thanks, Best regards, Martin Hemmsen

1283 1283 1284 1287 1283 1283 1283 1283 1283 1283 1283 1283 1283 1283 1285 1284 1283 1283 1283 1284 1284 1283 1284

136 posts
Was this reply useful? Learn more...
mrueschman +0 points · 5 months ago


Thanks for checking out the site and SHHS dataset. I think the variation in the number of commas on each given line that you're seeing derives from the "Comm" variable/column. "Comm" is a free text field that includes scorer notes about the overnight sleep study quality. Some of these notes include commas. That said, these field values that contain commas will be contained within double quotes, which most CSV parsers should understand. The dataset reads into Excel, SAS, and R correctly for me.

Example snippet from "Comm":

1,0,0,6,"Lot of alpha-delta sleep but not alpha intrusion. Sleeps on back entire time, only change in position when awake. Airflow choppy at times (-1 hr), chest very small amp (- 1 hr), Low baseline saO2 ~92%, Desats into 70's in REM.",0,8,8,8,8,8,8,

Here's a Stack Overflow post that describes handling commas in a CSV file.

Hope this helps!

2 posts
Was this reply useful? Learn more...
martinhemmsen +0 points · 5 months ago

Dear Michael,

Thanks for following up on my post. I simply used Text To Columns in Excel and it works fine, returning equal number of columns. My problem occurred because i used Matlab to parse and the parser was too simple. Now i just saved the parsed Excel file and can easily load it in Matlab.

Best regards, Martin

Topic is locked. Start a new topic