I have two data frame. One is user id with lat lon data and other is store code with store lat lon data. Around 89M rows are there. I want nearest (based on min.distance) store code corresponding user lat lon.
df1 -
id user_lat user_lon
1 13.031885 80.235574
2 19.099819 72.915288
3 22.226980 84.836070
df2 -
store_no s_lat s_lon
22 29.91 73.88
23 28.57 77.33
24 26.86 80.95
I have done so far -
from geopy.distance import vincenty
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
df1 = df1[['user_lat','user_lon']]
df2 = df2[['s_lat','s_lon']]
x = pd.merge(df1.assign(k=1), df2.assign(k=1), on='k', suffixes=('1', '2')) \
.drop('k',1)
x.head(20)
user_lat user_lon s_lat s_lon
0 13.031885 80.235574 29.91 73.88
1 13.031885 80.235574 28.57 77.33
2 13.031885 80.235574 26.86 80.95
3 19.099819 72.915288 29.91 73.88
4 19.099819 72.915288 28.57 77.33
5 19.099819 72.915288 26.86 80.95
6 22.226980 84.836070 29.91 73.88
7 22.226980 84.836070 28.57 77.33
8 22.226980 84.836070 26.86 80.95
x['dist'] = np.ravel(dist.pairwise(np.radians(store_lat_lon),np.radians(user_lat_lon)) * 6367)
user_lat user_lon s_lat s_lon dist
0 13.031885 80.235574 29.91 73.88 1986.237557
1 13.031885 80.235574 28.57 77.33 1205.217610
2 13.031885 80.235574 26.86 80.95 1386.069611
3 19.099819 72.915288 29.91 73.88 1752.628427
4 19.099819 72.915288 28.57 77.33 1143.731258
5 19.099819 72.915288 26.86 80.95 1031.246453
6 22.226980 84.836070 29.91 73.88 1538.449674
7 22.226980 84.836070 28.57 77.33 1190.620278
8 22.226980 84.836070 26.86 80.95 647.477461
But I want data frame looks like -
user_lat user_lon s_lat s_lon dist store_no
0 13.031885 80.235574 29.91 73.88 1986.237557 23
1 13.031885 80.235574 28.57 77.33 1205.217610 23
2 13.031885 80.235574 26.86 80.95 1386.069611 23
3 19.099819 72.915288 29.91 73.88 1752.628427 24
4 19.099819 72.915288 28.57 77.33 1143.731258 24
5 19.099819 72.915288 26.86 80.95 1031.246453 24
6 22.226980 84.836070 29.91 73.88 1538.449674 24
7 22.226980 84.836070 28.57 77.33 1190.620278 24
8 22.226980 84.836070 26.86 80.95 647.477461 24