2

I am trying to normalize json function using pandas but not able to flatten the nested arrays inside json

I have tried this reading one of the example but it only gives me one record using this code

from pandas.io.json import json_normalize
data ={
"id":"001",
    "counties" : [ {"name":"y"},{"name":"X"}], 
"extendedDescriptionFrench" : "Fromage Brick Petit Gaspesien",
"brand" : "PETIT GASPESIEN",
"brandFrench" : "PETIT GASPESIEN",
"productLife" : "90",
"digitalAssetFoodservice" : [ {
"digitalAssetFormatFoodservice" : "JPG",
"digitalAssetGDTIFoodservice" : "754000000016500000000002167445",
"digitalAssetImageVersionDateTimeFoodservice" : "2016-06-28T20:06:06.000-04:00",
"digitalAssetStateFoodservice" : "P"
}, {
"digitalAssetFormatFoodservice" : "JPG",
"digitalAssetGDTIFoodservice" : "754000000016500000000002167597",
"digitalAssetImageTypeFoodservice" : "M",
"digitalAssetImageVersionDateTimeFoodservice" : "2016-06-28T20:06:06.000-04:00"
}, {
"digitalAssetFormatFoodservice" : "JPG",
"digitalAssetGDTIFoodservice" : "754000000016500000000002167687",
"digitalAssetImageTypeFoodservice" : "C",
"digitalAssetImageVersionDateTimeFoodservice" : "2016-06-28T20:06:06.000-04:00",
"digitalAssetStateFoodservice" : "C"
} ]
}

a=json_normalize(data)
    print(a)

Is there anyway to flatten "digitalAssetFoodservice" array into columns.

Here is the output that i am getting Current ouput

And Also what if i have more than one nested array fields

adnan
  • 504
  • 1
  • 4
  • 21

1 Answers1

1

I believe that you need to pass the key of the nested array together with the non-nested ones.

a=json_normalize(data,'digitalAssetFoodservice',['id','extendedDescriptionFrench','brand','productLife'])

print(a)
print(a.columns)

Out:

  digitalAssetFormatFoodservice     digitalAssetGDTIFoodservice digitalAssetImageVersionDateTimeFoodservice  ...      extendedDescriptionFrench            brand productLife
0                           JPG  754000000016500000000002167445               2016-06-28T20:06:06.000-04:00  ...  Fromage Brick Petit Gaspesien  PETIT GASPESIEN          90
1                           JPG  754000000016500000000002167597               2016-06-28T20:06:06.000-04:00  ...  Fromage Brick Petit Gaspesien  PETIT GASPESIEN          90
2                           JPG  754000000016500000000002167687               2016-06-28T20:06:06.000-04:00  ...  Fromage Brick Petit Gaspesien  PETIT GASPESIEN          90

[3 rows x 9 columns]
Index(['digitalAssetFormatFoodservice', 'digitalAssetGDTIFoodservice',
       'digitalAssetImageVersionDateTimeFoodservice',
       'digitalAssetStateFoodservice', 'digitalAssetImageTypeFoodservice',
       'id', 'extendedDescriptionFrench', 'brand', 'productLife'],
      dtype='object')

When trying for two cases:

This is what I came up with given I couldn't not find a way to generate in one line your expected output. I also changed the script a bit to make it easier. (Based on: "Question: What is your expected output? Everyting duplicated, once for name: x and one for name: y ? OP: yes , duplicated"

from pandas.io.json import json_normalize
data ={
"id":"001",
"counties" : [ {"name":"y"},{"name":"X"}], 
"eDF" : "Fromage Brick Petit Gaspesien",
"brand" : "PETIT GASPESIEN",
"brandFrench" : "PETIT GASPESIEN",
"productLife" : "90",
"dAF" : [ {
"dAFF" : "JPG",
"dAGDTIF" : "75401652167445",
"dAIVDTF" : "2016-06-28",
"dASF" : "P"
}, {
"dAFF" : "JPG",
"dAGDTIF" : "75401652167597",
"dAITFa" : "M",
"dAIVDTF" : "2016-06-28"
}, {
"dAFF" : "JPG",
"dAGDTIF" : "7540162167687",
"dAITF" : "C",
"dAIVDTF" : "2016-06-28",
"dASF" : "C"
} ]
}

repetitive = ['id','eDF','brand','brandFrench','productLife']
a=json_normalize(data,'counties',repetitive)
b=json_normalize(data,'dAF',repetitive)
c = a.merge(b,how='inner',left_on=repetitive,right_on=repetitive)
print(a)

Output:

 name   id                            eDF            brand      brandFrench productLife
0    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90
1    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90

Now the other b:

print(b)

Output:

  dAFF         dAGDTIF     dAIVDTF dASF dAITFa dAITF   id                            eDF            brand      brandFrench productLife
0  JPG  75401652167445  2016-06-28    P    NaN   NaN  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90
1  JPG  75401652167597  2016-06-28  NaN      M   NaN  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90
2  JPG   7540162167687  2016-06-28    C    NaN     C  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90

Finally c:

print(c)

Output:

  name   id                            eDF            brand      brandFrench productLife dAFF         dAGDTIF     dAIVDTF dASF dAITFa dAITF
0    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167445  2016-06-28    P    NaN   NaN
1    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167597  2016-06-28  NaN      M   NaN
2    y  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG   7540162167687  2016-06-28    C    NaN     C
3    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167445  2016-06-28    P    NaN   NaN
4    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG  75401652167597  2016-06-28  NaN      M   NaN
5    X  001  Fromage Brick Petit Gaspesien  PETIT GASPESIEN  PETIT GASPESIEN          90  JPG   7540162167687  2016-06-28    C    NaN     C
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • Can i also pass other nested array fields in json_normalize function? – adnan Sep 19 '19 at 16:05
  • Yes, edit your question and I'll edit my asnwer accordingly. – Celius Stingher Sep 19 '19 at 16:20
  • Okay So for every nested field we have to create another variable and then merge altogether right using inner Right? – adnan Sep 19 '19 at 18:11
  • 1
    I just couldn't find a way to work it in one line, I've been trying to search for some examples and also couldn't make lostCode's answer work. You might try here, maybe you can find something https://stackoverflow.com/questions/47242845/pandas-io-json-json-normalize-with-very-nested-json However it is not the same, I hope my answer was helpful enough to help you solve your problem. – Celius Stingher Sep 19 '19 at 18:17
  • Yes it was useful , Thank you :) – adnan Sep 20 '19 at 06:56