The problem is simple i have two DataFrame :
one with 90 000 apartment and their latitude/longitude
and one with 3 000 pharmacy and their latitude/longitude
And i want to create a new variable for all my apartments : 'distance of the nearest pharmacy'
For this i tried two methods which spend to much time:
First method : I created a matrix with my apartments in row and my pharmacies in columns and the distance betwenn them in the intersection, after that i just take the min of the matrix to have a column vector of 90 000 value
I just use a double for with numpy :
m,n=len(result['latitude']),len(pharma['lat'])
M = np.ones((m,n))
for i in range(m):
for j in range(n):
if (result['Code departement'][i]==pharma['departement'][j]):
M[i,j] =(pharma['lat'][j]-result['latitude'][i])**2+(pharma['lng'][j]-result['longitude'] [i])**2
ps : i know that the wrong formula for lat/long but the appartments are in the same region so it's a good aproximation
Second method : I use the solution from this topics (who are the same problem but with less data) https://gis.stackexchange.com/questions/222315/geopandas-find-nearest-point-in-other-dataframe
I used geopandas et the nearest method :
from shapely.ops import nearest_points
pts3 = pharma.geometry.unary_union
def near(point, pts=pts3):
nearest = pharma.geometry == nearest_points(point, pts)[1]
return pharma[nearest].geometry.get_values()[0]
appart['Nearest'] = appart.apply(lambda row: near(row.geometry), axis=1)
And as i said, both methods spend too much time, after 1 hour of running my pc/notebook crashed and it failed.
My final question : have you a optimized method to go faster ? it is possible ? If it's already optimized, i will buy a other PC but which criteria but what criteria to look for to have a PC capable of doing such a quick calculation ?