0

Hi I am attempting to reshape this json data within a dataframe using pandas.

      id        categories
1     3ee877e0  [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]

Expected outcome

id        entity_def_id  permalink         uuid        value
3ee877e0  category       blockchain        1fea6201    Blockchain
3ee877e0  category       cryptocurrency    bd082f4d    Cryptocurrency
3ee877e0  category       loyalty-programs  4a45af54    Loyalty Programs
3ee877e0  category       marketplace-772d  772da8fe    Marketplace
3ee877e0  category       software          c08b5441    Software

Sorry for not posting my attempts at doing so, but I am new to python, and already know how to do it in mongodb and dataiku, just want to know of a way to do so using python.

dhruv
  • 151
  • 2
  • 3
  • 14

2 Answers2

0

You can try explode categories columns and then convert dictionaries in categories column to multiple columns

out = (df.assign(categories=df['categories'].apply(eval))
       .explode('categories', ignore_index=True)
       .pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
print(out)

         id entity_def_id         permalink      uuid             value
0  3ee877e0      category        blockchain  1fea6201        Blockchain
1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
4  3ee877e0      category          software  c08b5441          Software
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • it gives me error " : 'DataFrame' object has no attribute 'explode' " – dhruv Aug 10 '22 at 16:21
  • @dhruv Your pandas version seems to be lower than 0.25.0, try upgrade it. – Ynjxsjmh Aug 10 '22 at 16:28
  • it ran successfully after upgrade, but I am not getting desired output ``` id categories 1 3ee877e0-f008-45df-9606-ccc1a7a22a92 [{"entity_def_id":"category","permalink":"adve... id 0 1 3ee877e0-f008-45df-9606-ccc1a7a22a92 [{"entity_def_id":"category","permalink":"adve... ``` basically it just added "0" as column header – dhruv Aug 10 '22 at 17:18
  • @dhruv What's the result of `type(df['categories'].iloc[0])`? – Ynjxsjmh Aug 10 '22 at 17:56
  • the result for the query is : – dhruv Aug 10 '22 at 18:06
  • @dhruv Check the update – Ynjxsjmh Aug 10 '22 at 18:15
  • it works, thanks a lot, I still would need to learn what the code is doing here. but I am extremely grateful for your help – dhruv Aug 10 '22 at 18:21
0

You could take the list of dicts in categories, pass it to DataFrame() as-is, and then insert your id column using insert

import pandas as pd

current_df = pd.DataFrame({"id": "3ee877e0","categories":[[{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]]})

id_ = current_df.loc[:,"id"].values[0]
categories = current_df.loc[:,"categories"].values[0]

new_df = pd.DataFrame(categories )
new_df.insert(0,"id", id_ )

result

         id entity_def_id         permalink      uuid             value
0  3ee877e0      category        blockchain  1fea6201        Blockchain
1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
4  3ee877e0      category          software  c08b5441          Software
Joe Carboni
  • 421
  • 1
  • 6
  • it gave me ": DataFrame constructor not properly called!" error – dhruv Aug 10 '22 at 17:26
  • @dhruv - That's odd. I'm able to copy-paste this code into Python (3.9) and it works for me with Pandas version 1.2.3. Can you provide code that would allow me to recreate your initial dataframe as you have it shown? In my answer, I attempted to recreate it, but it's possible that it's not exactly what you have. – Joe Carboni Aug 10 '22 at 18:15