I have a set of 2000 geospatial points (lon/lat), which I need to match with several other geospatial datasets (I am using Geopandas GeoDataFrames). I am using the sklearn BallTree function to find the neighbors within a certain radius of each point (in the function below, point is one of the 2000 points and right_gdf is the dataset that I need to get the neighbors from).
I am currently using a for-loop to loop through all of the 2000 points and find the neighbors for each of them. However, depending on the size of right_gdf, this can take a long time. I am sure there is a way to speed this process up, potentially with parallel computing, but I am struggling to find it. I tried to use Dask delayed to parellelise the loop (see code below) but somehow this takes even longer than the simple for loop.
# Function that finds a point's neighbors within a certain radius
def neighbours_radius(point, right_gdf, R=1):
# Create tree from the right gdf (use haversine for lat/lon coordinates)
tree = BallTree(right_gdf, leaf_size=40, metric='haversine')
# Find indices of all neighbors within R
indices = tree.query_radius(point, r=r)[0]
return indices
# Function that loops through the 2000 points
def knn_gpd(right_gdf, R=75):
# Load the gdf with the 2000 points
base = gpd.read_file(...)
# Empty list to fill in the indices of the neighbors
neighbors = []
# Loop through the points and find the neighbors within R.
for i in range(len(base)):
point = base.iloc[i:i+1,:]
ind = neighbours_radius(point, right_gdf, R=R)
# append index lists
neighbors.append(ind)
return neighbors
# Function that loops through the 2000 points with Dask delayed
def knn_gpd_dask(right_gdf, R=75):
# Load the gdf with the 2000 points
base = gpd.read_file(...)
# Empty list to fill in the indices of the neighbors
neighbors = []
# Loop through the points and find the neighbors within R.
for i in range(len(base):
point = base.iloc[i:i+1,:]
ind = delayed(neighbours_radius)(point, right_gdf, R=R)
# append index list
neighbors.append(ind)
result = compute(neighbors)
return result
Can anyone help me speed up this process?