0

I have a following dataframe:

id  ip  
1   219.237.42.155
2   75.74.144.120
3   219.237.42.155

By using maxmindb-geolite2 package, I can find out what city a specific ip is assigned to. The following code:

from geolite2 import geolite2
reader = geolite2.reader()
reader.get('219.237.42.155')

will return a dictionary, and by looking up keys, I can actually get a city name:

reader.get('219.237.42.155')['city']['names']['en']

returns:

'Beijing'

The problem I have is that I do not know how to get the city for each ip in the dataframe and put it in the third column, so the result would be:

id  ip              city
1   219.237.42.155  Beijing
2   75.74.144.120   Hollywood
3   219.237.42.155  Beijing

The farthest I got was mapping the whole dictionary to a separate column by using the code:

df['city'] = df['ip'].apply(lambda x: reader.get(x))

On the other hand:

df['city'] = df['ip'].apply(lambda x: reader.get(x)['city']['names']['en'])

throws a key error.. What am I missing?

codeless
  • 49
  • 5
  • 1
    Perhaps one or more `ip`s cause `reader.get` to raise an Exception. What is the error message? What Exception is raised? – unutbu May 25 '17 at 19:18
  • KeyError: 'city'. If I use try...except clause, it populates the third column with blanks only. – codeless May 25 '17 at 19:23
  • `KeyError` tells me that it's returning a dictionary, just not with the keys you expected. Try `lambda x: reader.get(x).get('city', dict(names=dict(en='NA')))['names']['en']` – piRSquared May 25 '17 at 19:55
  • It wouldn't work either. I found out that on the original dataset, some records indeed miss 'city' key, whereas others do not return a dictionary (obviously for ip 127.0.0.1). Right now I managed to loop through the dataframe with 'for' loop, but I was looking for something more elegant/less heavy. – codeless May 25 '17 at 20:12

1 Answers1

0
#you can use apply to check if the key exists before trying to access its values.
df.apply(lambda x: reader.get(x.ip,np.nan),axis=1).apply(lambda x: np.nan if pd.isnull(x) else x['city']['names']['en'])
Out[39]: 
0    Beijing
1        NaN
2    Beijing
dtype: object
Allen Qin
  • 19,507
  • 8
  • 51
  • 67