0

I have data in a dataframe as seen below (BEFORE)

I am trying to parse/flatten the JSON in the site_Activity column , but I am having no luck.

I have tried some of the methods below as a proof I have tried to solve this on my own.

I have provided a DESIRED AFTER section to highlight how I would expect the data to parse.

Any help is greatly appreciated!

not working df = df.explode(column='site_Activity').reset_index(drop=True)https://stackoverflow.com/questions/54546279/how-to-normalize-json-string-type-column-of-pandas-dataframe

not working pd.json_normalize(df.site_Activity[0]) How to convert JSON data inside a pandas column into new columns

BEFORE

id site_Activity
123 [{"action_time":"2022-07-05T01:53:59.000000Z","time_spent":12,"url":"cool.stuff.io/advanced"},{"action_time":"2022-07-05T00:10:20.000000Z","time_spent":0,"url":"cool.stuff.io/advanced1"},{"action_time":"2022-07-04T23:45:39.000000Z","time_spent":0,"url":"cool.stuff.io"}]
456 [{"action_time":"2022-07-04T23:00:23.000000Z","time_spent":0,"url":"cool.stuff.io/awesome"}]

DESIRED AFTER

id action_time time_spent url
123 2022-07-05T01:53:59.000000Z 12 cool.stuff.io/advanced
123 2022-07-05T00:10:20.000000Z 0 cool.stuff.io/advanced1
123 2022-07-04T23:45:39.000000Z 0 cool.stuff.io
456 2022-07-04T23:00:23.000000Z 0 cool.stuff.io/awesome
hansolo
  • 903
  • 4
  • 12
  • 28

2 Answers2

1

You can:

  • use .apply(json.loads) to transform the json column into a list/dict column;
  • use df.explode to transform the list o dicts into a Series of dicts;
  • use .apply(pd.Series) to 'explode' de Series of dicts into a DataFrame;
  • use pd.concat to 'merge' the new columns to the rest of the data.

Then it comes to:

import io
import pandas as pd
import json

TESTDATA="""id;site_Activity
123;[{"action_time":"2022-07-05T01:53:59.000000Z","time_spent":12,"url":"cool.stuff.io/advanced"},{"action_time":"2022-07-05T00:10:20.000000Z","time_spent":0,"url":"cool.stuff.io/advanced1"},{"action_time":"2022-07-04T23:45:39.000000Z","time_spent":0,"url":"cool.stuff.io"}]
456;[{"action_time":"2022-07-04T23:00:23.000000Z","time_spent":0,"url":"cool.stuff.io/awesome"}]
780;
"""

df = pd.read_csv(io.StringIO(TESTDATA), sep=";")

df["site_Activity"] = df["site_Activity"].fillna("[{}]").apply(json.loads)

df = df.explode("site_Activity")

df = pd.concat([df[["id"]], df["site_Activity"].apply(pd.Series)], axis=1)

print(df)
# result
    id                  action_time  time_spent                      url
0  123  2022-07-05T01:53:59.000000Z          12   cool.stuff.io/advanced
0  123  2022-07-05T00:10:20.000000Z           0  cool.stuff.io/advanced1
0  123  2022-07-04T23:45:39.000000Z           0            cool.stuff.io
1  456  2022-07-04T23:00:23.000000Z           0    cool.stuff.io/awesome
2  780                          NaN         NaN                      NaN
Caio
  • 56
  • 4
  • Thank you first and foremost for your help! I attempted to recreated but I get the below error ```the JSON object must be str, bytes or bytearray, not NoneType``` – hansolo Jul 13 '22 at 14:05
  • That's because you might have some NaN (empty) values on your dataset. Try to use "fillna" method before applying the json.loads function: `df["site_Activity"] = df["site_Activity"].fillna("[{}]").apply(json.loads)` – Caio Jun 15 '23 at 22:56
1

You can use this:

df = pd.json_normalize(json_data_var, 'site_Activity', ['id'])

'site_Activity' without the square brackets is like the 'many' part; 'id' in the square brackets the 'one'.

Take this example:

url = 'https://someURL.com/burgers'

headers = {
    'X-RapidAPI-Key': 'someKey',
    'X-RapidAPI-Host': 'someHost.someAPI.com'
}

response = requests.request('GET', url, headers = headers)

api_data = response.json()

df = pd.json_normalize(api_data)

print(df)

This now gives me a dataframe where the 'ingredients' and 'addresses' columns contain nested data

enter image description here

I can then focus on addresses and normalise that

df = pd.json_normalize(api_data, 'addresses', ['name', 'restaurant', 'web', 'description'])

enter image description here

opperman.eric
  • 314
  • 1
  • 14