1

I'm trying to open a bz2 file and read the json file contained inside. My current implementation looks like

with bz2.open(bz2_file_path, 'rb') as f:
    json_content = f.read()
json_df = pd.read_json(json_content.decode('utf-8'), lines = True)

I need to repeat this process many times, and the the with block is taking up the bulk of the time. Is there a way which I can speed this process up?

baked goods
  • 237
  • 2
  • 10

1 Answers1

2

The following variation of your code won't necessarily read all the code into memory at once. Passing encoding to bz2.open() allows the decoding to be done on the fly, and panads.read_json() can accept a file-like object to read incrementally.

with bz2.open(bz2_file_path, 'rt', encoding='utf-8') as f:
  json_df = pd.read_json(f, lines=True)
orip
  • 73,323
  • 21
  • 116
  • 148