1

I have a collection of longitudes and latitudes, and I want to be able to extract the district of each of these coordinates using Python.

As of right now, I have developed the following function using the geopy library,

from geopy.geocoders import Nominatim
from geopy.point import Point

MAX_RETRIES = 5

def get_district(lat, longi):
  geolocator = Nominatim(user_agent="http")
  point = Point(lat, longi)

  retries = 0

  while retries < MAX_RETRIES:
    retries += 1
    try:
      location = geolocator.reverse(point)
      district = location.raw['address']['state_district']
      return district
    except:
      print('Request failed.')
      print('Retrying..')
      time.sleep(2)

  print('Max retries exceeded.')
  return None

This works fine for a single point, but I have a number of them (approximately 10,000) and this only works for one coordinate at a time. There is no option to make bulk requests for several points.

Furthermore, this API becomes quite unreliable when making multiple such requests.

Is there a better way to achieve this using Python? I am open to any approach. Even if there is a file of sorts that I can find with a mapping of the coordinates against the districts, it works for me.

Note: At the moment, I am looking at coordinates in Sri Lanka.

Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35

1 Answers1

1

You can use Geopandas. First you have to download the shapefiles of Sri Lanka (DDL) then extract the files of the second level (district, adm2). Finally,

# pip install geopandas
import geopandas as gpd
from shapely.geometry import Point

# you also need .shx file
gdf = gpd.read_file('lka_admbnda_adm2_slsd_20220816.shp')

def get_district(lat, longi):
    point = Point(longi, lat)  # swap longi and lat here
    return gdf.loc[gdf.contains(point), 'ADM2_EN'].squeeze()

Usage:

>>> get_district(6.927079, 79.861244)
'Colombo'

>>> get_district(9.661498, 80.025547)
'Jaffna'
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • A compressed geojson version ([wetransfer](https://wetransfer.com/downloads/0ec41070125be4ca1e51a717b88ccab720230615071252/0108ca29fde3c235057f5f01cd25e4a620230615071308/57c3d7?trk=TRN_TDL_01&utm_campaign=TRN_TDL_01&utm_medium=email&utm_source=sendgrid)). Use `gdf = pd.read_file('district.zip')` directly – Corralien Jun 15 '23 at 07:15
  • I am attempting to run this using PySpark on Databricks and for some reason it is taking a really long time. Do you have any other suggestions for how I can get this to work? I have looked at the options of writing the geopandas DataFrame to a file, but I can't really make sense of the columns. – Minura Punchihewa Jun 20 '23 at 04:55