I'm new to dask so bear with me.
I have a JSON file where each row has the following schema:
{
'id': 2,
'version': 7.3,
'participants': range(10)
}
participants is a nested field.
input_file = 'data.json'
df = db.read_text(input_file).map(json.loads)
I can do either:
df.pluck(['id', 'version'])
or
df.pluck('participants').flatten()
But how can I do the equivalent of a Spark explode, where I could at the same time select the id
, version
and flatten the participants
?
So the output would be :
{'id': 2, 'version': 7.3, 'participants': 0}
{'id': 2, 'version': 7.3, 'participants': 1}
{'id': 2, 'version': 7.3, 'participants': 2}
{'id': 2, 'version': 7.3, 'participants': 3}
...