0

I am trying to read lines from a jsonl file, but I am getting the following error.

Traceback (most recent call last): File "insertion_script.py", line 12, in for line in f.iter(): File "C:\Users\Administrator\Anaconda3\lib\site-packages\jsonlines\jsonlines.py", line 204, in iter skip_empty=skip_empty) File "C:\Users\Administrator\Anaconda3\lib\site-packages\jsonlines\jsonlines.py", line 143, in read lineno, line = next(self._line_iter) File "C:\Users\Administrator\Anaconda3\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 886: invalid start byte

BH_data = []
with jsonlines.open('2401659.jsonl','r') as f:
    for line in f.iter():
        BH_data.append(line)
mustafa zaki
  • 367
  • 1
  • 6
  • 20

1 Answers1

1

The implication is that your data is not actually in UTF-8. 0xA3 happens to be the British pound sterling symbol in the Windows code page. You should try

import codecs
with codecs.open('2401659.jsonl','r',encoding='cp1252') as jfile:
    with jsonlines.Reader(jfile) as f:
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30