2

Given a nested json, is there a way to load and flatten it in vaex?

This is a way to do it in pandas:

import pandas as pd
from pandas.io.json import json_normalize

df = pd.read_json(input_file)
df = pd.concat([df, json_normalize(df['eventData'])], axis=1)

The json could be something like this:

[
{"timestamp":..., "id": ..., "eventData": {"type":..., "name":...}},
{"timestamp":..., "id": ..., "eventData": {"type":..., "name":...}}
]

And the dataframe's columns should be "timestamp", "id", "type" and "name".

scc
  • 10,342
  • 10
  • 51
  • 65
  • If you transform the data slightly, you can probably use `vaex.from_dict()`, no? – AMC Jan 29 '20 at 19:53
  • @AMC Not really sure what you mean. `from_dict` expects a dictionary, not a list of dictionaries which is what `df['eventData']` is. In pandas I could also use `from_records` and `concat` with `axis=1` (ie ```df = pd.concat([pd.read_json(f), pd.DataFrame.from_records(df['eventData'])], axis=1)```) but none of those are available in vaex. – scc Jan 30 '20 at 10:51
  • Did you figure this out? – Khashir Jun 02 '20 at 21:19
  • If you can already do this with pandas, why not just do `vaex_df = vaex.from_pandas(pandas_df)` ? – Joco Jun 10 '20 at 08:55
  • @Joco This issue with doing `json_normalize` in `pandas` is the memory usage. On large datasets with large JSON data in a column the function does not complete in `pandas` – rchitect-of-info Jun 23 '21 at 13:43
  • @rchitect-of-info indeed, so in that case one will need to resort to chunking the file, and export smaller manageable bits to hdf5 or arrow, and open them all at once with vaex. – Joco Jun 28 '21 at 13:28

0 Answers0