Extracting JSON in pandas column to separate columns while handling rows with None

Question

I have a pandas dataframe called df that contains twitter tweets created by taking the twitter json and loading it into the dataframe. I am trying to extract the interesting information. The coordinates column is mostly None, but sometimes it contains GeoJSON in this format:

{'coordinates': [21.425775, 8.906141], 'type': 'Point'}

Here 21.425775 refers to the longitude and 8.906141 refers to the latitude. I would like to extract the latitude and longitude into separate columns. Unfortunately my pandas skills are more towards the beginner level, so I am not sure how to do find and substring; also there seems to be better ways as suggested in this question which I don't fully understand.

An example of the dataframe is:

  coordinates
0 None
1 {'coordinates': [21.425775, 8.906141], 'type': 'Point'}

How can I extract the information in the nested JSON column into separate pandas columns while gracefully handling the None values in the other rows?

{'coordinates': [21.425775, 8.906141], 'type': 'Point'} is a sample, another sample would be None — Superdooperhero, Jul 20 '18 at 07:29
Can you show a print of `df`? I am not able to understand how the column is... — Rakesh, Jul 20 '18 at 07:31

Rakesh · Accepted Answer · 2018-07-20T07:41:26.273

1

If your 'coordinates' is a list then you can use tolist() with pd.DataFrame

Ex:

import pandas as pd
import numpy as np

df = pd.DataFrame({'coordinates': [{'coordinates': [21.425775, 8.906141], 'type': 'Point'}, None]})
df['temp'] = df['coordinates'].apply(lambda x: x.get("coordinates") if x else [np.nan, np.nan]).dropna()
df[['longitude','latitude']] = pd.DataFrame(df.temp.values.tolist(), index= df.index)
df.drop('temp', axis=1, inplace=True)
print(df)

Output:

                                         coordinates  longitude  latitude
0  {u'type': u'Point', u'coordinates': [21.425775...  21.425775  8.906141
1                                               None        NaN       NaN

edited Jul 20 '18 at 07:41

answered Jul 20 '18 at 07:07

Rakesh

81,458
17
76
113

Gives me ValueError: Columns must be same length as key, presumably because of the 'type': 'Point' part. – Superdooperhero Jul 20 '18 at 07:25
Or possibly the None part – Superdooperhero Jul 20 '18 at 07:26
Updated snippet – Rakesh Jul 20 '18 at 07:41
Thanks! Works nice. Why does it need the dropna part? – Superdooperhero Jul 20 '18 at 08:07
You are welcome :) and you are correct...you do not need dropna() – Rakesh Jul 20 '18 at 09:51

Extracting JSON in pandas column to separate columns while handling rows with None

1 Answers1