I have two datasets (csv files). Both of them contains latitudes-longitudes of two sets (220 and 4400) of points. Now I want to measure pairwise distances (miles) between these two sets of points (220 x 4400). How can I do that in python? Similar to this problem: https://gist.github.com/rochacbruno/2883505
Asked
Active
Viewed 554 times
-2
-
Are you asking about the math, or what exactly was the problem when you tried to do that? – mkrieger1 Jul 28 '20 at 20:38
-
What are "stop ids"? – mkrieger1 Jul 28 '20 at 20:39
-
@mkrieger1, just consider them (stop ids) a set of points. I edited that part of the question. In simple words, I am looking for a python code to calculate distances between latitude longitude pairs. – Asad Tan Jul 28 '20 at 20:55
-
Why don’t you write this code yourself, that would be faster. Why don’t you use the square root of the sum of the squared differences between two latitudes and two longitudes? – mkrieger1 Jul 28 '20 at 21:02
1 Answers
2
Best is to use sklearn
which has exactly what you ask for.
Say we have some sample data
towns = pd.DataFrame({
"name" : ["Merry Hill", "Spring Valley", "Nesconset"],
"lat" : [36.01, 41.32, 40.84],
"long" : [-76.7, -89.20, -73.15]
})
museum = pd.DataFrame({
"name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
"lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
"long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})
You can use sklearn
distance metrics, which has the haversine implemented
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
After you extract the numpy
array values with
places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values
you simply
EARTH_RADIUS = 6371.009
haversine_distances = dist.pairwise(np.radians(places_gps), np.radians(museum_gps) )
haversine_distances *= EARTH_RADIUS
to get the distances in KM
. If you need miles, multiply with constant.
If you are only interested in the closest few, or all within radius, check out sklearn
BallTree algorithm which also has the haversine implemented. It is much faster.
Edit: To convert the output to a dataframe use for instance
pd_distances = pd.DataFrame(haversine_distances, columns=museum.name, index=towns.name, )
pd_distances

Willem Hendriks
- 1,267
- 2
- 9
- 15
-
Thanks a lot. Instead of array output, could you please tell how can I get the output in dataframe format? Say, an output consisting of 3 columns in this case (towns, museum, distance). – Asad Tan Jul 28 '20 at 22:40
-
1The output is a distance-matrix. If you want to sort them by distance for each town and create a list that is possible, but won't be a nice dataframe as you get lists/arrays as data. So the 3 columns suggestion is a little hard for me to create. I made the more straightforward distance matrix. – Willem Hendriks Jul 29 '20 at 10:54