We use cookies and other tools to enhance your experience on our website and to analyze our web traffic.
For more information about these cookies and the data collected, please refer to our Privacy Policy.

Trouble importing into Jupyter

3 posts
Was this reply useful? Learn more...
 
[-]
ritwickagrawal +0 points · about 1 year ago

Hello, I have been trying to analyze sleep heart health study1 and 2 data on Jupyter notebook using Python kernel. For some reason, when I use the command below, I get the following error. I looked online and noted that it might be something to do with data uploaded in mac to the opening in windows. Should I use Ruby to download the csv file instead?

dfSH2=pd.read_csv("shhs2_dataset.csv")

UnicodeDecodeError Traceback (most recent call last) ~\AppData\Local\Temp\1\ipykernel_19908\3877291478.py in <module> ----> 1 dfSH2=pd.read_csv("shhs2_dataset.csv")

~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 309 stacklevel=stacklevel, 310 ) --> 311 return func(*args, **kwargs) 312 313 return wrapper

~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 676 kwds.update(kwds_defaults) 677 --> 678 return _read(filepath_or_buffer, kwds) 679 680

~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds) 579 580 with parser: --> 581 return parser.read(nrows) 582 583

~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read(self, nrows) 1251 nrows = validate_integer("nrows", nrows) 1252 try: -> 1253 index, columns, col_dict = self._engine.read(nrows) 1254 except Exception: 1255 self.close()

~\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in read(self, nrows) 223 try: 224 if self.low_memory: --> 225 chunks = self._reader.read_low_memory(nrows) 226 # destructive to chunks 227 data = _concatenate_chunks(chunks)

~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read_low_memory()

~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 42968: invalid start byte

442 posts
bio
Was this reply useful? Learn more...
 
[-]
mrueschman +0 points · about 1 year ago

Thanks for using the site. You don't need Ruby to download the CSV datasets - it's more useful when downloading thousands of EDF and annotation files at once.

From what I can tell this is likely an encoding issue, perhaps you need to lookup our CSV dataset's encoding and specify that in the read_csv command.

I found some links that might be helpful:

https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python https://www.kaggle.com/code/paultimothymooney/how-to-resolve-a-unicodedecodeerror-for-a-csv-file