Flatten JSON in Dataframe Column

Question

I have data in a dataframe as seen below (BEFORE)

I am trying to parse/flatten the JSON in the site_Activity column , but I am having no luck.

I have tried some of the methods below as a proof I have tried to solve this on my own.

I have provided a DESIRED AFTER section to highlight how I would expect the data to parse.

Any help is greatly appreciated!

not working df = df.explode(column='site_Activity').reset_index(drop=True)https://stackoverflow.com/questions/54546279/how-to-normalize-json-string-type-column-of-pandas-dataframe

not working pd.json_normalize(df.site_Activity[0]) How to convert JSON data inside a pandas column into new columns

BEFORE

id	site_Activity
123	[{"action_time":"2022-07-05T01:53:59.000000Z","time_spent":12,"url":"cool.stuff.io/advanced"},{"action_time":"2022-07-05T00:10:20.000000Z","time_spent":0,"url":"cool.stuff.io/advanced1"},{"action_time":"2022-07-04T23:45:39.000000Z","time_spent":0,"url":"cool.stuff.io"}]
456	[{"action_time":"2022-07-04T23:00:23.000000Z","time_spent":0,"url":"cool.stuff.io/awesome"}]

DESIRED AFTER

id	action_time	time_spent	url
123	2022-07-05T01:53:59.000000Z	12	cool.stuff.io/advanced
123	2022-07-05T00:10:20.000000Z	0	cool.stuff.io/advanced1
123	2022-07-04T23:45:39.000000Z	0	cool.stuff.io
456	2022-07-04T23:00:23.000000Z	0	cool.stuff.io/awesome

Caio · Answer 1 · 2023-06-15T22:58:41.567

You can:

use .apply(json.loads) to transform the json column into a list/dict column;
use df.explode to transform the list o dicts into a Series of dicts;
use .apply(pd.Series) to 'explode' de Series of dicts into a DataFrame;
use pd.concat to 'merge' the new columns to the rest of the data.

Then it comes to:

import io
import pandas as pd
import json

TESTDATA="""id;site_Activity
123;[{"action_time":"2022-07-05T01:53:59.000000Z","time_spent":12,"url":"cool.stuff.io/advanced"},{"action_time":"2022-07-05T00:10:20.000000Z","time_spent":0,"url":"cool.stuff.io/advanced1"},{"action_time":"2022-07-04T23:45:39.000000Z","time_spent":0,"url":"cool.stuff.io"}]
456;[{"action_time":"2022-07-04T23:00:23.000000Z","time_spent":0,"url":"cool.stuff.io/awesome"}]
780;
"""

df = pd.read_csv(io.StringIO(TESTDATA), sep=";")

df["site_Activity"] = df["site_Activity"].fillna("[{}]").apply(json.loads)

df = df.explode("site_Activity")

df = pd.concat([df[["id"]], df["site_Activity"].apply(pd.Series)], axis=1)

print(df)

# result
    id                  action_time  time_spent                      url
0  123  2022-07-05T01:53:59.000000Z          12   cool.stuff.io/advanced
0  123  2022-07-05T00:10:20.000000Z           0  cool.stuff.io/advanced1
0  123  2022-07-04T23:45:39.000000Z           0            cool.stuff.io
1  456  2022-07-04T23:00:23.000000Z           0    cool.stuff.io/awesome
2  780                          NaN         NaN                      NaN

Thank you first and foremost for your help! I attempted to recreated but I get the below error ```the JSON object must be str, bytes or bytearray, not NoneType``` — hansolo, Jul 13 '22 at 14:05
That's because you might have some NaN (empty) values on your dataset. Try to use "fillna" method before applying the json.loads function: `df["site_Activity"] = df["site_Activity"].fillna("[{}]").apply(json.loads)` — Caio, Jun 15 '23 at 22:56

score 1 · Answer 2 · answered Nov 15 '22 at 08:47

You can use this:

df = pd.json_normalize(json_data_var, 'site_Activity', ['id'])

'site_Activity' without the square brackets is like the 'many' part; 'id' in the square brackets the 'one'.

Take this example:

url = 'https://someURL.com/burgers'

headers = {
    'X-RapidAPI-Key': 'someKey',
    'X-RapidAPI-Host': 'someHost.someAPI.com'
}

response = requests.request('GET', url, headers = headers)

api_data = response.json()

df = pd.json_normalize(api_data)

print(df)

This now gives me a dataframe where the 'ingredients' and 'addresses' columns contain nested data

I can then focus on addresses and normalise that

df = pd.json_normalize(api_data, 'addresses', ['name', 'restaurant', 'web', 'description'])

Flatten JSON in Dataframe Column

2 Answers2