1

I have a pandas dataframe called df that contains twitter tweets created by taking the twitter json and loading it into the dataframe. I am trying to extract the interesting information. The coordinates column is mostly None, but sometimes it contains GeoJSON in this format:

{'coordinates': [21.425775, 8.906141], 'type': 'Point'}

Here 21.425775 refers to the longitude and 8.906141 refers to the latitude. I would like to extract the latitude and longitude into separate columns. Unfortunately my pandas skills are more towards the beginner level, so I am not sure how to do find and substring; also there seems to be better ways as suggested in this question which I don't fully understand.

An example of the dataframe is:

  coordinates
0 None
1 {'coordinates': [21.425775, 8.906141], 'type': 'Point'}

How can I extract the information in the nested JSON column into separate pandas columns while gracefully handling the None values in the other rows?

Superdooperhero
  • 7,584
  • 19
  • 83
  • 138

1 Answers1

1

If your 'coordinates' is a list then you can use tolist() with pd.DataFrame

Ex:

import pandas as pd
import numpy as np

df = pd.DataFrame({'coordinates': [{'coordinates': [21.425775, 8.906141], 'type': 'Point'}, None]})
df['temp'] = df['coordinates'].apply(lambda x: x.get("coordinates") if x else [np.nan, np.nan]).dropna()
df[['longitude','latitude']] = pd.DataFrame(df.temp.values.tolist(), index= df.index)
df.drop('temp', axis=1, inplace=True)
print(df)

Output:

                                         coordinates  longitude  latitude
0  {u'type': u'Point', u'coordinates': [21.425775...  21.425775  8.906141
1                                               None        NaN       NaN
Rakesh
  • 81,458
  • 17
  • 76
  • 113