2

Working function (see code Python Pandas 'apply' returns series; can't convert to dataframe) has stopped working. Only difference is I'm passing it a string concatenation.

# Get geocode, return LAT and LON
def locate(x):
    geolocator = Nominatim()
    print("'" + x + "'")
    location = geolocator.geocode(x)  # Get geocode
    print(location)
    lat = location.latitude
    lon = location.longitude
    try:
        #Get geocode
        location = geolocator.geocode(x, timeout=8, exactly_one=True)
        lat = location.latitude
        lon = location.longitude
    except:
        #didn't work for some reason that I really don't care about
        lat = np.nan
        lon = np.nan
        print(lat,lon)
    return pd.Series([lat,  lon])

This works

In[4] locate('MOSCOW   123098 RUSSIA')
'MOSCOW   123098 RUSSIA'
Москва, Центральный административный округ, Москва, ЦФО, Россия
Out[4]:
0    55.751633
1    37.618704
dtype: float64

But this does not:

df_addr[['LAT','LON']] =  df_addr['COUNTRY'].apply(locate(df_addr['CITY'] + ' ' + \
                                                          df_addr['PROVINCE'] + ' ' + \
                                                          df_addr['STATE'] + ' ' + \
                                                          df_addr['ZIP_CODE'] + ' ' + \
                                                          df_addr['COUNTRY'])) # Geocode it!

I see the function echoing the correct input strings:

0                 'INNSBRUCK    AUSTRIA'
1           'BERN   CH-3001 SWITZERLAND'
2                 'INNSBRUCK    AUSTRIA'
3               'MOSCOW   123098 RUSSIA'
4               'MOSCOW   123098 RUSSIA'
5              'FREDERICK  MD 21702 USA'

Removing the try/except I get the following fugly exception info

.
.
99    'GLASGOW LANARK  G20 9NB SCOTLAND'
dtype: object
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
C:\Users\gn\Anaconda3\lib\site-packages\geopy\geocoders\base.py in _call_geocoder(self, url, timeout, raw, requester, deserializer, **kwargs)
    131         try:
--> 132             page = requester(url, timeout=(timeout or self.timeout), **kwargs)
    133         except Exception as error: # pylint: disable=W0703

C:\Users\gn\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault)
    152         opener = _opener
--> 153     return opener.open(url, data, timeout)
    154 

C:\Users\gn\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
    460             meth = getattr(processor, meth_name)
--> 461             response = meth(req, response)
    462 

C:\Users\gn\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
    570             response = self.parent.error(
--> 571                 'http', request, response, code, msg, hdrs)
    572 

C:\Users\gn\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
    498             args = (dict, 'default', 'http_error_default') + orig_args
--> 499             return self._call_chain(*args)
    500 

C:\Users\gn\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    432             func = getattr(handler, meth_name)
--> 433             result = func(*args)
    434             if result is not None:

C:\Users\gn\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
    578     def http_error_default(self, req, fp, code, msg, hdrs):
--> 579         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    580 

HTTPError: HTTP Error 500: Internal Server Error

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
C:\Users\gn\Anaconda3\lib\site-packages\geopy\geocoders\base.py in _call_geocoder(self, url, timeout, raw, requester, deserializer, **kwargs)
    146                 try:
--> 147                     raise ERROR_CODE_MAP[code](message)
    148                 except KeyError:

KeyError: 500

During handling of the above exception, another exception occurred:

GeocoderServiceError                      Traceback (most recent call last)
<ipython-input-6-7412c2e27dd8> in <module>()
----> 1 df_addr[['LAT','LON']] =  df_addr['COUNTRY'].apply(locate(df_addr['CITY'] + ' ' +                                                           df_addr['PROVINCE'] + ' ' +                                                           df_addr['STATE'] + ' ' +                                                           df_addr['ZIP_CODE'] + ' ' +                                                           df_addr['COUNTRY'])) # Geocode it!
      2 df_addr.head()

<ipython-input-3-d957ac2e2e2e> in locate(x)
      3     geolocator = Nominatim()
      4     print("'" + x + "'")
----> 5     location = geolocator.geocode(x,timeout=20)  # Get geocode
      6     print(location)
      7     lat = location.latitude

C:\Users\gn\Anaconda3\lib\site-packages\geopy\geocoders\osm.py in geocode(self, query, exactly_one, timeout, addressdetails, language, geometry)
    190         logger.debug("%s.geocode: %s", self.__class__.__name__, url)
    191         return self._parse_json(
--> 192             self._call_geocoder(url, timeout=timeout), exactly_one
    193         )
    194 

C:\Users\gn\Anaconda3\lib\site-packages\geopy\geocoders\base.py in _call_geocoder(self, url, timeout, raw, requester, deserializer, **kwargs)
    147                     raise ERROR_CODE_MAP[code](message)
    148                 except KeyError:
--> 149                     raise GeocoderServiceError(message)
    150             elif isinstance(error, URLError):
    151                 if "timed out" in message:

GeocoderServiceError: HTTP Error 500: Internal Server Error

Am in over my head. Updated all libraries, but no change in the problem.

Thanks in advance

Community
  • 1
  • 1
Harvey
  • 617
  • 8
  • 18
  • 1
    I don't see how that would work, you're passing series and trying to concatenate them as an arg, not strings. you either have to explicitly pass each series as a param or pass the row and construct the str in the function – EdChum Apr 05 '15 at 22:40
  • Actually is the problem that you have too many white spaces due to missing data? your print output seems to indicate this, does it still fail if you pass string 'GLASGOW LANARK G20 9NB SCOTLAND'? – EdChum Apr 05 '15 at 22:41
  • Ed, I see what you are saying - that I'm passing a series - but am not sure how to fix it. Short of ditching apply and iterating through the table, or passing 5 parameters and then iterating through the table. I thought apply() did this for me - calling the function once for each row. The debug code seems to indicate this as it says that x is a str type, not series. Hmmm... I believe you, I'm just trying to wrap my head around it and decide what to do next. And I manually confirmed the geocoder is insensitive to white spaces. Any Further advice? – Harvey Apr 05 '15 at 23:31

2 Answers2

1

What you're doing is a little perverse to be honest, you're calling apply on a series and then trying to construct a str from lots of columns, this is the wrong way to go about this, you can call apply on the df and pass axis=1 so that the row is passed and either access each column in a lambda func and pass them to locate or in locate extract each column value, or just create a series from the concatenation of all the columns and call apply on this:

df_addr[['LAT','LON']] = (df_addr['CITY'] + ' ' + df_addr['PROVINCE'] + ' ' + df_addr['STATE'] + ' ' + df_addr['ZIP_CODE'] + ' ' + df_addr['COUNTRY']).apply(locate)

The above should work I believe.

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • I tried that got the same error. I have fixed it... a kludge, but it worked. Thanks for helping me figure out what was going wrong. – Harvey Apr 07 '15 at 15:19
1

So based on Ed Chum's insight, I coded the following fugly kludge which worked:

#Create a summary address field in a new geo dataframe
df_geo = pd.DataFrame(columns = ['BIG_ADDR', 'LAT', 'LON'])
df_geo['BIG_ADDR'] =  df = df_addr['CITY'] + ' ' +  df_addr['PROVINCE'] + ' ' + df_addr['STATE'] + ' ' +  \
                       df_addr['ZIP_CODE'] + ' ' + df_addr['COUNTRY'] 
# Eliminate dups
df_geo = df_geo['BIG_ADDR'].drop_duplicates().reset_index()

# Geocode ALL THINGS in GEO frame!
df_geo[['LAT','LON']] = df_geo['BIG_ADDR'].apply(locate)

# Create the same index in the address dataframe
df_addr['BIG_ADDR'] =  df = df_addr['CITY'] + ' ' +  df_addr['PROVINCE'] + ' ' + df_addr['STATE'] + ' ' +  \
                       df_addr['ZIP_CODE'] + ' ' + df_addr['COUNTRY'] 

# Combine the address and geo frames 
    df_addr = pd.merge(df_addr, df_geo, on=['BIG_ADDR'], how='left') 
    df_addr.rename(columns={'LAT_y': 'LAT', 'LON_y': 'LON'}, inplace=True)           #cleanup
df_addr.rename(columns={'LAT_y': 'LAT', 'LON_y': 'LON'}, inplace=True)
del df_geo['index']
Harvey
  • 617
  • 8
  • 18
  • Correction - I didn't try exactly what Ed Chum suggested... got close, but missed a detail. I'll try that next time though. Thanks - you need a tip jar! – Harvey Apr 07 '15 at 15:32
  • Kudoz to Ed Chum. I went back and recoded it as he suggested and it worked perfectly. What's better is I understand why! So thank you my new online friend! – Harvey Apr 07 '15 at 20:00