Dask dataframe containing json format column

Question

I have a dask dataframe containing a column in json format, and I want to parse the column into dataframe format.

the column in json format looks like:

{"Name": {"id": 1000, "address": "ABC", ....}},,,

So I want to extract only value of "Name", and make each keys in them a column, each values a values in it, like:

id    address ...
1000  ABC
2000  DEF
3000  GHA
...   ...

I think we can read json file into dask dataframe by read_json, but how could I do that?

If it's pandas dataframe, I would do use json_normalize from pandas.io.json, like (not working in dask dataframe), df_json = json_normalize(df['json_col'].apply(lambda x: json.loads(x))) df_json.head() — SayZ, May 14 '20 at 04:35
So you could do something similar with dask bag, `db.read_text('datajsonl').map(json.loads).compute() ` . Then convert to a dataframe with `.to_dataframe`. Have you read over https://examples.dask.org/applications/json-data-on-the-web.html ? — quasiben, May 14 '20 at 13:04
@quasiben , please submit this as an answer, so it doesn't look like the question is pending — mdurant, May 14 '20 at 13:19
@quasiben sorry, there's 1 thing I did not mention. I read data from mysql using read_sql_table method. so, I can't use other methods to read like read_text. I mean, the output dataframe by read_sql_table contains a column in json format, which I want to normalize. — SayZ, May 15 '20 at 01:59

score -1 · Answer 1 · answered May 23 '20 at 18:01

-1

The operation that you're doing appears to be embarrasingly parallel. As a result, you can write a Pandas function and then apply that function across a dask dataframe in parallel.

def f(df: pandas.DataFrame) -> pandas.DataFrame:
    ... however you would do this in Pandas

ddf = ddf.map_partitions(f)

answered May 23 '20 at 18:01

MRocklin

55,641
23
163
235

This won't work without the `meta` keyword. Please provide complete answers or avoid commenting at all! – Dzeri96 Aug 15 '21 at 13:30

Dask dataframe containing json format column

1 Answers1