1

I have 2 dataframes in Pandas that contain longitude and latitude. I am trying to loop through each of the rows in the first and find the closest matching longitude and latitude in the second dataframe.

I have this in python so far which I found in another SO post...

from math import cos, asin, sqrt

def distance(lat1, lon1, lat2, lon2):
    p = 0.017453292519943295
    a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
    return 12742 * asin(sqrt(a))

def closest(data, v):
    return min(data, key=lambda p: distance(v['lat'],v['lon'],p['lat'],p['lon']))

tempDataList = [{'lat': 39.7612992, 'lon': -86.1519681}, 
                {'lat': 39.762241,  'lon': -86.158436 }, 
                {'lat': 39.7622292, 'lon': -86.1578917}]

v = {'lat': 39.7622290, 'lon': -86.1519750}
print(closest(tempDataList, v))

I am about to try and modify this for use with my pandas dataframes, but is there a more efficient way to do this with PyProj for example?

Does anybody have an example or similar code?

fightstarr20
  • 11,682
  • 40
  • 154
  • 278

1 Answers1

1

I think you will be able to do this a little easier if you use GIS library. So, If you're using geopandas and shapely, it will be more comfortable. (pyproj is also used.) Start with the code below.

import pandas as pd
import geopandas as gpd
from shapely.ops import Point, nearest_points

tempDataList = [{'lat': 39.7612992, 'lon': -86.1519681}, 
                {'lat': 39.762241,  'lon': -86.158436 }, 
                {'lat': 39.7622292, 'lon': -86.1578917}]

df = pd.DataFrame(tempDataList)

#make point geometry for geopandas
geometry = [Point(xy) for xy in zip(df['lon'], df['lat'])]

#use a coordinate system that matches your coordinates. EPSG 4326 is WGS84
gdf = gpd.GeoDataFrame(df, crs = "EPSG:4326", geometry = geometry) 

#change point geometry
v = {'lat': 39.7622290, 'lon': -86.1519750}
tp = Point(v['lon'], v['lat'])

#now you can calculate the distance between v and others.
gdf.distance(tp)

#If you want to get nearest points
multipoints = gdf['geometry'].unary_union
queried_point, nearest_point = nearest_points(tp, multipoints)
print(nearest_point)
Urban87
  • 243
  • 1
  • 8