0

I have a list of lat/long coordinates and need to obtain the state for each. That can be done with the code:

df = pd.read_csv('SOL_A.dsv', delimiter = '|', low_memory=False)
for index, row in df.iterrows(): 
    lat = row['LAT']
    lon = row['LONG']
    g = geocoder.osm([lat,lon], method='reverse')
    st = '_UN'
    if g.state != None:
        st = g.state
    geom_states.append(st)
df['STATE'] = geom_states

But for my ~5k records it eventually starts yielding Status code 429 from https://nominatim.openstreetmap.org/search: ERROR - 429 Client Error: Too Many Requests for URL: tps://nominatim.openstreetmap.org/search?q=0.0%2C+0.0&format=jsonv2&addressdetails=1&limit=1 which is expected.

I only have to process this once and don't mind if it takes a whole day. I read through the OSM Acceptable Use Policy and it goes:

  • No heavy uses (an absolute maximum of 1 request per second).
  • Provide a valid HTTP Referer or User-Agent identifying the application (stock User-Agents as set by HTTP libraries will not do).
  • Clearly display attribution as suitable for your medium.
  • Data is provided under the ODbL license which requires to share alike (although small extractions are likely to be covered by fair usage / fair dealing).

So.. It should be possible (?)

I tried adding my API Key (geocoder.osm([lat,lon], method='reverse', key=API_KEY)) and also added a time.sleep(1.1) before each call to be sure, but didn't really help.

Ideas?

filippo
  • 5,583
  • 13
  • 50
  • 72
  • You don't need OSM for that. On which detail level do you need to identify the state? Country, state, county...? e.g. do you need to identify its the US or its California or its Fresno? – Joe Apr 02 '20 at 05:48
  • Then download some file with the geographic data of your region, e.g. a GeoJSON or a Shape file. Then import the data into a framework (Shapely, GeoPandas, there are many, see https://medium.com/@chrieke/essential-geospatial-python-libraries-5d82fcc38731) – Joe Apr 02 '20 at 05:52
  • Then there usually is a function to find the polygon where your lat,lon are in. Very low level would be something like https://streamhacker.com/2010/03/23/python-point-in-polygon-shapely/ – Joe Apr 02 '20 at 05:53
  • https://shapely.readthedocs.io/en/latest/manual.html#polygons – Joe Apr 02 '20 at 05:54
  • https://shapely.readthedocs.io/en/latest/manual.html#object.contains – Joe Apr 02 '20 at 05:54
  • But these are very low level functions, you can probably find a very convenient function that just takes a GeoJSON file and your lat,lon as a numpy array. – Joe Apr 02 '20 at 05:55
  • thanks, these are great tips. Will try that, was already installing me own nominatim – filippo Apr 02 '20 at 13:21

1 Answers1

0

Nominatim Usage Policy clearly states:

  • No heavy uses (an absolute maximum of 1 request per second).
  • Provide a valid HTTP Referer or User-Agent identifying the application (stock User-Agents as set by http libraries will not do).
  • Clearly display attribution as suitable for your medium.
  • Data is provided under the ODbL license which requires to share alike (although small extractions are likely to be covered by fair usage / fair dealing).

Looks like you don't limit your requests to a maximum of 1 per second. Also I'm not sure if you pass a valid HTTP referer (aka user-agent).

Note that this usage policy only applies to OSM's public Nominatim instance. You can always install your own Nominatim service or switch to an alternative/commercial Nominatim instance.

scai
  • 20,297
  • 4
  • 56
  • 72
  • thanks! shouldn't the ```time.sleep(1.1)``` limit the queries?will also look into the HTTP refer, reckon that's more than my API_KEY. – filippo Apr 02 '20 at 13:22
  • Yes, it should. Not sure what your API key is for, OSM's public Nominatim instance neither needs nor supports API keys. – scai Apr 02 '20 at 13:56