0

I have a series of city name in pandas dataframe. For that I need to find out the address of particular city and store them at separate column in the same dataframe. City column contain NaN values too. I am getting address for a given location / city name separately. But it is not working in a pandas dataframe

data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for i in df.Name:
    if i == "NaN":
       continue
    loc = geolocator.geocode(i)
address = loc.address
print(address)

It is working for the data frame but returns the last address alone and not for the entire 3 cities. If we change the order like below,

data = [['Nan',10],['Madurai',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])

I am getting the error : GeocoderTimedOut: Service timed out

Query: 1. I want the results (address) to be saved in a column 2. How to process Nan values

Kavikayal
  • 143
  • 4
  • 14

3 Answers3

0

You can add a column with the addresses in this way:

import pandas as pd
data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
from geopy.geocoders import Nominatim
geolocator = Nominatim()
for i in df.Name:
    if i == "NaN":
        continue
    df.loc[df.Name == i, 'Address'] = geolocator.geocode(i)

print(df)
dzang
  • 2,160
  • 2
  • 12
  • 21
  • getting an error : ValueError: setting an array element with a sequence – Kavikayal Mar 17 '19 at 10:07
  • The problem is that `geolocator.geocode(i)` returns a sequence instead of a value. What does `print(geolocator.geocode(i))` returns? I don't have your data so I cannot reproduce, but if you put a generic string in `df.loc[df.Name == i, 'Address'] = 'address_string'` the code works as expected, so try to fix that part. Maybe `geolocator.geocode(i)[0]` is enough... – dzang Mar 17 '19 at 14:33
0

You only get the last values because you keep replacing loceach time in the loop. The GeocoderTimedOut: Service timed out error arises because you are making to many requests to the server. You should include a sleep between the requests. If you still get this error take a look at this: Link - Avoid time out

Try:

import pandas as pd
from geopy.geocoders import Nominatim
import time

data = [['madurai',10],['NaN',12],['hosur',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
geolocator = Nominatim(user_agent='test')
address = []
for i in df.Name:
    time.sleep(3)
    if i == "NaN":
       address.append('NaN')
       continue    
    address.append(geolocator.geocode(i))

df['address'] = address
vmouffron
  • 418
  • 1
  • 4
  • 11
0

I introduced timedelay between requests as below and few lines to view progress bar

from geopy.geocoders import Nominatim
geolocator = Nominatim()
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
final['Geolocation'] = final['city'].apply(geocode)
from tqdm import tqdm
tqdm.pandas()
final['Geolocation'] = final['city'].progress_apply(geocode)

It works now.

Kavikayal
  • 143
  • 4
  • 14