I have 39,803 different lat lon points (here are the first 5 points)
lat lon
0 27.269987 -82.497004
1 27.537598 -82.508422
2 27.337793 -82.533753
3 27.497719 -82.570261
4 27.512062 -82.722158
That I've calculated the haversine distance matrix for
array([[ 0. , 29.77832527, 8.36846516, ..., 30.07072193,
7.60700598, 7.63477669],
[ 29.77832527, 0. , 22.35749757, ..., 2.6836159 ,
22.17639199, 23.07090099],
[ 8.36846516, 22.35749757, 0. , ..., 23.07825172,
3.10333262, 0.75441483],
...,
[ 30.07072193, 2.6836159 , 23.07825172, ..., 0. ,
22.53766911, 23.75965211],
[ 7.60700598, 22.17639199, 3.10333262, ..., 22.53766911,
0. , 3.03795035],
[ 7.63477669, 23.07090099, 0.75441483, ..., 23.75965211,
3.03795035, 0. ]])
thus my matrix is 39,803 by 39,803. I would like to figure out a way to find all pairs that had a distance of less than 25 meters. For instance, if we consider the first array
[ 0. , 29.77832527, 8.36846516, ..., 30.07072193,
7.60700598, 7.63477669]
the pair
(lat[0],lon[0])-(lat[2],lon[2])=(27.269987,-82.497004)-(27.337793,-82.533753) = 8.36846516
satisfies this criteria but
(lat[0],lon[0])-(lat[1],lon[1])=(27.269987,-82.497004)-(27.537598,-82.508422) = 29.77832527
does not. I'd like to get a subset of points that satisfy this criteria. This is what I have so far:
X=df[['lat','lon']].dropna(axis=0)
coors=np.array(X)
from math import radians, cos, sin, asin, sqrt
from scipy.spatial.distance import pdist, squareform
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import pandas as pd
def haversine(lonlat1, lonlat2):
lat1, lon1 = lonlat1
lat2, lon2 = lonlat2
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
distance_matrix = squareform(pdist(X, (lambda u,v: haversine(u,v))))
I'd greatly appreciate any help on how I could potentially compute this