You can adapt the answer I gave to your other linked question in a very straightforward way. The outcome is also very fast (~425 ms for 10K points in a
).
Edit: For sparse cases (where the number of points actually filtered is a small portion of the complete grid), I also add a sparse_grid_within_radius()
version below.
(Important: use scipy >= 1.6.0
, where the Python implementation of KDTree
has been replaced by cKDTree
. See the release notes).
# please use scipy >= 1.6.0
from scipy.spatial import KDTree
def grid_within_radius(a, radius=10, resolution=(1,1), origin=(0,0)):
pmin = (a - origin).min(axis=0) - radius
pmax = (a - origin).max(axis=0) + radius
imin = np.floor(pmin / resolution)
imax = np.ceil(pmax / resolution) + 1
xv, yv = np.meshgrid(np.arange(imin[0], imax[0]), np.arange(imin[1], imax[1]))
grid = np.stack((xv, yv), axis=-1) * resolution + origin
dist, _ = KDTree(a).query(grid, k=1, distance_upper_bound=radius + 0.01)
idx = dist <= radius
return grid[idx], dist[idx]
Usage
First, the OP's example:
np.random.seed(123)
a = np.random.randint(10, size=(100, 2))
g, d = grid_within_radius(a)
To compare with the OP's result, we need to sort their solution (coord_set, dists
):
def sort2d(a, other=None):
other = a if other is None else other
return other[np.lexsort((a[:, 0], a[:, 1]))]
With this, we can check that our solution is the same:
>>> np.allclose(g, sort2d(coord_set))
True
>>> np.allclose(d, sort2d(coord_set, dists))
True
And another example (with a different grid resolution and radius):
g, d = grid_within_radius(a, radius=0.6, resolution=(.11, .47))
plt.scatter(*a.T, s=10, c='r')
plt.scatter(*g.T, s=1)
Speed
a = np.random.randint(1000, size=(10_000, 2))
%timeit grid_within_radius(a)
# 425 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sparse version
The above works well when the number of points returned is a significant portion of the total grid (e.g. 30% or more). But for very sparse cases (e.g. where a combination of the bounding box of a
, the radius
and the grid resolution
result in generating a huge grid and then eliminate most of it), then it is slow. To illustrate, the following is a sparse case for a = np.random.randint(0, 200, (10, 2))
:
The version below fixes that by generating "patches" of grid around quantized locations instead of the full grid.
import numpy as np
import pandas as pd
from scipy.spatial import KDTree
def unique_rows(a):
# np.unique(a, axis=0) is slow, in part because it sorts;
# using pandas.drop_duplicates() instead.
# see https://github.com/numpy/numpy/issues/11136#issuecomment-822000680
return pd.DataFrame(a).drop_duplicates().values
def sparse_grid_within_radius(a, radius=10, resolution=(1,1), origin=(0,0), rbox=1):
resolution = np.array(resolution)
box_size = radius * rbox
nxy0 = np.floor(radius / resolution).astype(int)
nxy1 = np.ceil((box_size + radius) / resolution).astype(int) + 1
patch = np.stack(list(map(np.ravel, np.indices(nxy0 + nxy1))), axis=1) - nxy0
ar = np.floor((a - origin) / box_size) * box_size
ia = unique_rows(np.floor(ar / resolution).astype(int))
grid = unique_rows((ia[:, None] + patch).reshape(-1, 2)) * resolution + origin
dist, _ = KDTree(a).query(grid, k=1, distance_upper_bound=radius * 1.01)
idx = dist <= radius
return grid[idx], dist[idx]
With this, even very sparse results are fast.
Example
a = np.random.randint(0, 4000, (100, 2))
%timeit g, d = grid_within_radius(a)
# 3.88 s ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit gs, ds = sparse_grid_within_radius(a)
# 29.6 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Speed comparison
import perfplot
perfplot.show(
setup=lambda n: np.random.randint(0, n, (100, 2)),
kernels=[grid_within_radius, sparse_grid_within_radius],
n_range=[2 ** k for k in range(14)],
equality_check=lambda a, b: np.allclose(sort2d(a[0]), sort2d(b[0])),
)
