I have a point cloud of N points in D-dimensional space with periodic boundary conditions, where N can range from 500 to 10^8 and D can range from 1 to 20. The distribution of points varies wildly, from completely uniform to very clumped together. For each point in the point cloud I need to find the k nearest neighbours to that point. I also need to find how many points exist within a distance of each point, specifically the maxnorm distance. I don't need to know which points are within the radius, just how many, but it would be a nice addition.
I've tried kd-trees, but they don't handle the wrapping boundaries, and for the larger trees, duplication is not feasible. Additionally, it gets slow at higher dimensions.
I've just come across Vantage Point Trees, and tried out some code, but it is slower than the kd-tree. Although the code I found uses a recursive search method, with no batching. One the positive side, it can natively handle the wrapping conditions, and as such doesn't require duplication.
I'm about to see if I can squeeze some more performance out of the VP tree by converting to an iterative approach and seeing if I could batch search, but I had a thought. All these data structures work for finding nearest neighbours to arbitrary query points, while my query points are restricted to being points in the point cloud. I figure this restriction might allow for some more performant structure (maybe a nav-mesh of sorts?). I tried searching for structures that would handle this, but my google-fu is failing me. So just wondering if anyone knows of a data structure that can handle the following:
- Handle a small and large number of points, i.e 500-10^8 points
- Handle up to 20 dimensions
- Work with periodic boundaries (i.e., a flat torus)
- Work with maxnorm distance (soft requirement. Euclidean can give me a potential list which I can manually cull, but maxnorm would be preferred)
- Can find k-NN to query point as well as find how many points exist with distance to query point
- Query points are only points in the structure, not arbitrary points
- Queries can be batched. i.e I need to find the k-th NN for every point in the point cloud. I also need to find how many points exist within d[i] for each point i. That is each point has a different search radius.
- Doesn't need to support insert or deletion.
Thanks