How to convert JSON data inside a pandas column into new columns

Question

I have this short version of ADSB json data and would like to convert it into dataFrame columns as Icao, Alt, Lat, Long, Spd, Cou.....

After Alperen told me to do this

df = pd.read_json('2016-06-20-2359Z.json', lines=True),

I can load it into a DataFrame. However, df.acList is

[{'Id': 10537990, 'Rcvr': 1, 'HasSig': False, ...
Name: acList, dtype: object

How can I get the Icao, Alt, Lat, Long, Spd, Cou data?

 "src":1,
   "feeds":[  
      {  
         "id":1,
         "name":"ADSBexchange.com",
         "polarPlot":false
      }
   ],
   "srcFeed":1,
   "showSil":true,
   "showFlg":true,
   "showPic":true,
   "flgH":20,
   "flgW":85,
   "acList":[  
      {  
         "Id":11281748,
         "Rcvr":1,
         "HasSig":false,
         "Icao":"AC2554",
         "Bad":false,
         "Reg":"N882AS",
         "FSeen":"\/Date(1466467166951)\/",
         "TSecs":3,
         "CMsgs":1,
         "AltT":0,
         "Tisb":false,
         "TrkH":false,
         "Type":"CRJ2",
         "Mdl":"2001 
BOMBARDIER INC 
 CL-600-2B19",
         "Man":"Bombardier",
         "CNum":"7503",
         "Op":"EXPRESSJET AIRLINES INC - ATLANTA, GA",
         "OpIcao":"ASQ",
         "Sqk":"",
         "VsiT":0,
         "WTC":2,
         "Species":1,
         "Engines":"2",
         "EngType":3,
         "EngMount":1,
         "Mil":false,
         "Cou":"United States",
         "HasPic":false,
         "Interested":false,
         "FlightsCount":0,
         "Gnd":false,
         "SpdTyp":0,
         "CallSus":false,
         "TT":"a",
         "Trt":1,
         "Year":"2001"
      },
      {  
         "Id":11402205,
         "Rcvr":1,
         "HasSig":true,
         "Sig":110,
         "Icao":"ADFBDD",
         "Bad":false,
         "FSeen":"\/Date(1466391940977)\/",
         "TSecs":75229,
         "CMsgs":35445,
         "Alt":8025,
         "GAlt":8025,
         "AltT":0,
         "Call":"TEST1234",
         "Tisb":false,
         "TrkH":false,
         "Sqk":"0262",
         "Help":false,
         "VsiT":0,
         "WTC":0,
         "Species":0,
         "EngType":0,
         "EngMount":0,
         "Mil":true,
         "Cou":"United States",
         "HasPic":false,
         "Interested":false,
         "FlightsCount":0,
         "Gnd":true,
         "SpdTyp":0,
         "CallSus":false,
         "TT":"a",
         "Trt":1
      }
   ],
   "totalAc":4231,
   "lastDv":"636019887431643594",
   "shtTrlSec":61,
   "stm":1466467170029
}
</pre>

Sergey Bushmanov · Accepted Answer · 2020-07-09T13:20:54.023

44

If you already have your data in acList column in a pandas DataFrame, simply do:

import pandas as pd
pd.io.json.json_normalize(df.acList[0])

Alt AltT    Bad CMsgs   CNum    Call    CallSus Cou EngMount    EngType ... Sqk TSecs   TT  Tisb    TrkH    Trt Type    VsiT    WTC Year
0   NaN 0   False   1   7503    NaN False   United States   1   3   ...     3   a   False   False   1   CRJ2    0   2   2001
1   8025.0  0   False   35445   NaN TEST1234    False   United States   0   0   ... 0262    75229   a   False   False   1   NaN 0   0   NaN

Since pandas 1.0 the imports should be:

import pandas as pd
pd.json_normalize(df.acList[0])

edited Jul 09 '20 at 13:20

answered Sep 24 '17 at 15:28

Sergey Bushmanov

23,310
7
53
72

This is on Python 2.7, right? I am not able to do it on Python 3. Would you please help me with that? – Hamid Feb 14 '19 at 14:50
@Hamid This is Python 3. If you're having problems with your code post it to SO, tag with `Pandas`, `json` or whatsoever and we will try to help you. Cheers! – Sergey Bushmanov Feb 14 '19 at 15:03
What if one or more than one rows has `None` , meaning no data available for some records? – Regressor Mar 25 '19 at 03:39
@Regressor you can solve by replacing `None` by `{}`, can check [my answer](https://stackoverflow.com/a/62862747/7385033) – Felipe Augusto Jul 12 '20 at 15:09

score 22 · Answer 2 · answered Mar 04 '19 at 19:21

22

@Sergey's answer solved the issue for me but I was running into issues because the json in my data frame column was kept as a string and not as an object. I had to add the additional step of mapping the column:

import json
import pandas as pd
pd.io.json.json_normalize(df.acList.apply(json.loads))

answered Mar 04 '19 at 19:21

ThinkBonobo

15,487
9
65
80

3

Thank you worked for my case, where I had JSON data in the 1st column, and I wanted to transpose it to other columns – Mantej Singh Apr 23 '19 at 15:03
This works, but what if I have this column that I am converting into multiple, in addition to other columns that I want to keep as-is (they are regular columns). How can I achieve this? – Aly Sep 29 '21 at 04:56
not sure and I don't really have a pandas env setup anymore to check but maybe you can search for a 'flatMap' function? – ThinkBonobo Sep 29 '21 at 17:52

score 7 · Answer 3 · answered Jun 25 '20 at 19:39

7

Since pandas 1.0, json_normalize is available in the top-level namespace. Therefore use:

import pandas as pd
pd.json_normalize(df.acList[0])

answered Jun 25 '20 at 19:39

Jonne Kleijer

592
5
14

score 2 · Answer 4 · answered Sep 17 '19 at 00:41

2

I can't comment yet on ThinkBonobo's answer but in case the JSON in the column isn't exactly a dictionary you can keep doing .apply until it is. So in my case

import json
import pandas as pd

json_normalize(
    df
    .theColumnWithJson
    .apply(json.loads)
    .apply(lambda x: x[0]) # the inner JSON is list with the dictionary as the only item
)

answered Sep 17 '19 at 00:41

blockw

29
1

Useful contribution. On my data in Panda 0.24.2 the last line is not required. This may be specific to your data. I.e. mine looks like {'key':value} Perhaps yours looks like [{'key':value}] – jabberwocky Aug 19 '20 at 12:09

score 0 · Answer 5 · answered Jul 12 '20 at 15:07

In my case I had some missing values (None) then I created a more specific code that also drops the original column after creating the new ones:

for prefix in ['column1', 'column2']:
    df_temp = df[prefix].apply(lambda x: {} if pd.isna(x) else x)
    df_temp = pd.io.json.json_normalize(df_temp)
    df_temp = df_temp.add_prefix(prefix + '_')
    df.drop([prefix], axis=1, inplace=True)
    df = pd.concat([df, df_temp], axis = 1, sort=False)

How to convert JSON data inside a pandas column into new columns

5 Answers5

Linked