1

I have a huge file (>400MB) of NDJson formatted data and like to flatten it into a table format for further analysis.

I started iterate through the various objects manually but some are rather deep and might even change over time, so I was hoping for a more general approach.

I was certain pandas lib would offer something but could not find anything that would help my case. Also, the several other libs I found seem to not ‘fully’ provide what I was hoping for (flatten_json). It all seems very early on.

Is it possible that there is not good (fast and easy) solve for this at this time?

Any help is appreciated

DirkLX
  • 1,317
  • 1
  • 10
  • 16
  • Just a suggestion: whenever I need to analyse JSON data I tend to stick them in a database. If the structure is known I go with SQLite+SQLAlchemy. Alternatively, there are lots of noSQL object-based databases that you can use to store and analyse/explore such file. However, with this approach you always need one "ingest" script and one or more analysis scripts... – urban Jun 10 '18 at 19:53
  • Thanks for the response but I was really hoping to get it done using pandas dataframe. Definitely don't want to introduce a DB at this time. – DirkLX Jun 11 '18 at 20:54

1 Answers1

2

pandas read_json has a bool param lines, set this to True to read ndjsons

data_frame = pd.read_json('ndjson_file.json', lines=True)