1

I want to import a JSON lines file into pandas. I tried to import it like a regular JSON file, but it did not work:

js = pd.read_json (r'C:\Users\Name\Downloads\profilenotes.jsonl')
Michael M.
  • 10,486
  • 9
  • 18
  • 34
Jed
  • 331
  • 2
  • 11

1 Answers1

2

This medium article provides a fairly simple answer, which can be adapted to be even shorter. All you need to do is read each line then parse each line with json.loads(). Like this:

import json
import pandas as pd


lines = []
with open(r'test.jsonl') as f:
    lines = f.read().splitlines()

line_dicts = [json.loads(line) for line in lines]
df_final = pd.DataFrame(line_dicts)

print(df_final)

As cgobat pointed out in a comment, the medium article adds a few extra unnecessary steps, which have been optimized in this answer.

Michael M.
  • 10,486
  • 9
  • 18
  • 34
  • 1
    There's no need for an intermediate DataFrame, because you can directly initialize a DataFrame using an iterable of dicts. Just create a list like `line_dicts = [json.loads(line) for line in lines]`, then you can go straight to your final DataFrame by simply calling `df_final = pd.DataFrame(line_dicts)`. – L0tad Nov 11 '22 at 17:30
  • Also, line 12 here (`df_inter['json_element'].apply(json.loads)`) doesn't do anything. Since you're not calling it `inplace` and you don't assign it to anything, nothing happens to the result of that statement. Plus, you're already doing the same operation in the following line. – L0tad Nov 11 '22 at 17:32