0

What I am looking for?

I am looking for a python library with a method/module that gets coordinates and return country name without connect to external API

Why?

I have a pandas df with a lot of rows (much more 10000) and I don't want to send a request for each row.

this want I am doing now:

from geopy.geocoders import ArcGIS
...
...
...

geolocator = ArcGIS(scheme='http')

    for index, row in df.iterrows():

        if math.isnan(row['latitude']) or math.isnan(row['longitude']):
            continue
        else:
            try:
                location = geolocator.reverse((row['latitude'], row['longitude']), timeout=30)
                # takes the country
                location = str(location)
                if len(location.split(",")) == 4:
                    country = location.split(",")[3][1:]
                    df.at[index, 'country'] = country

if it possible to send one request for all the rows it still fine

Dean Taler
  • 737
  • 1
  • 10
  • 25
  • 1
    Why not just download a shapefile containing country boundaries and labels (like [this one](https://hub.arcgis.com/datasets/a21fdb46d23e4ef896f31475217cbb08_1)), and then use [geopandas](https://geopandas.org/) to figure out in which country a point is located? This doesn't require any remote access once you have the shapefile. – larsks Sep 14 '20 at 13:20

2 Answers2

2

There seem to be no off-the-shelf libraries, but one possible solution to do this without external APIs would be to:

Paul Crease
  • 150
  • 5
0

if it possible to send one request for all the rows it still fine

I think this is API dependant, and I don't think anyone would allow somthing like that (nevertheless, the last word relies on the documentation of each API).

Having said that, I don't see how you could escape from making a request per entry.

One possibility is to evaluate how fine grained the information you're looking for needs to be, and then decide if you can use one same result to fill more than one row, but then again, that requires some extra work.

I once had to do the exact same, and although it was a bit tedious, the best solution I could find was to make all the requests. Here are some ideas which may help:

  1. You can split the data in chunks so it's all more managable in case something goes wrong.
  2. periodically save the results to disk
  3. you may consider running the code on Google Colab or Kaggle, so you don't depend on your own pc running and not interrupting the connection during the time the task is being complete (I find this a huge plus - I don't want to rely on my pc or connection with time consuming tasks ;) ).
89f3a1c
  • 1,430
  • 1
  • 14
  • 24