What spatial indexing algorithm should I use?

Question

I want to implement some king of spatial indexing data structure for my MKAnnotations. Currently it's horribly slow when I try to filter them based on distance criteria ( 3-4k of locations, currently extremely slow with a simple double for ... ).

I'd like to create clusters of MKAnnotations, to decide if it is close to another. Also, these locations are in a somewhat (creation) order and a "previous"/"next" functionality would be needed to "jump" between (this is not a must). I've read about kd-tree and r-tree structures and they both seem to meet the fast distance/neighbor obtaining option for filtering/clustering, but I'm not sure which is the best for me or if there are other options too. What algorithm/data structure should I use ?

Update: I store these locations in a Core Data database, they represent a path. When the map is opened they are fetched into an array and then I just use that array for distance calculations and annotation creation. When the user moves/zooms the map, I loop through them and decide what needs to be changed on map, so it is kinda static the whole stuff. As I understood, if I'd be using a tree, I could store the locations there and when a zoom/move happens I just search through it and obtain the ones in the new region. Is this true ?

Even in the dynamic case, when I can add new locations to this array, it would be a single insertion and it's happening rarely.

Has QUIT--Anony-Mousse · Accepted Answer · 2012-10-05T12:44:11.917

8

It depends a lot on what your usage patterns are (how my writes, for example, in-memory or on-disk) and how your data looks like (that is how it is distributed).

R-trees are good because they are balanced, and allow updating. The R*-tree in my experience is clearly better than the other variants because of the split strategy it has. The benefit is that it produces more square pages than the other strategies, so that for many queries you will need to scan fewer pages.

kd-trees are good if you are in-memory and static. Updating them is very bad, you will need to rebuild the index quite often.

And if your data does not change very often, bulk-loading for the R-tree works very well. You can do Sort-Tile-Recursive bulk loading, which essentially requires (partially) sorting your data on X and Y alternatingly, so it takes a low O(n log n) to build the tree; very similar to bulk-loading an kd-tree, except that you multi-split instead of binary splitting. This is very popular.

Furthermore, you can keep track of the number of objects in each page. When displaying things on a map, you may want to stop early when a page would display too small on the screen (i.e. smaller than a marker). At this point, you would not scan that page, but only take the number of objects and display that as a clustered marker until the user zooms in.

For 2D data, with a limited value domain, do not overlook the simple things. Quadtrees can work really well, too! Simplicity can make it a lot easier to optimize things. Or a classic grid approach. If your users tend to spread their annotations in an area (and not put them all into one place), you can just compute integer x,y grid coordinates, and then hash them and make a list for each grid cell.

edited Oct 05 '12 at 12:44

answered Oct 03 '12 at 21:02

Has QUIT--Anony-Mousse

76,138
12
138
194

See my usage pattern edits above. I don't want to use grid-like hashes, they don't look good and for my case they aren't suitable. I'll check out the R* tree right now. – Templar Oct 04 '12 at 07:48
If your data is static, bulk-loading an R-tree (there is no difference between bulk-loaded R and R*, as they differ on insertion only) is also an option, try sort-tile-recursive. – Has QUIT--Anony-Mousse Oct 04 '12 at 07:53
Note that with grid-like hashes I'm only referring to your data structure, NOT to the visual representation to the user. You just *group* objects for organization if they fall into the same grid cell. On display, you only scan through those cells that you need to show. A tree barely can do better. – Has QUIT--Anony-Mousse Oct 04 '12 at 07:54
Thanks for your answer, I'll accept it soon, I'd like just one thing to clarify : When the map is opened, I'll build this R/R* tree. If I understood the algorithm correctly, all I have to do after the tree is in memory, is to query the rectangles which intersect with the visible region of the map and maybe use some clustering on the result. Am I right ? – Templar Oct 05 '12 at 08:59
Yes, if you do range queries on an R/R*-tree you "refine" all pages that intersect with your query window, and skip those that do not overlap. This straightforward way of querying an R-tree is what makes it so attractive. For building the tree, STR is really simple and will probably work well. Whether you do want some marker clustering afterwards I don't know. But you could maybe sometimes stop early: if a whole page is smaller than a few pixels on your display, you could turn the page into a clustered marker, too. You can keep object counts in the R-tree pages easily - this is a common trick. – Has QUIT--Anony-Mousse Oct 05 '12 at 12:40
R-tree indices for spatial queries has been added to Core Data in Xcode 9, see WWDC 2017 What's new in Core Data video. – malhal Jul 11 '17 at 22:21
LOL. R-trees are defined as balanced trees, and they try to sell an completely unbalanced tree as R-tree. And apparently only for range search (no distance search), and up to 32 bit? – Has QUIT--Anony-Mousse Jul 12 '17 at 06:14

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

I am no iOS developer, but I looked over the docs and found this:

MKMapView.annotationsInMapRect:

Returns the annotation objects located in the specified map rectangle.

(NSSet *)annotationsInMapRect:(MKMapRect)mapRect

Parameters

mapRect: The portion of the map that you want to search for annotations.

Return Value The set of annotation objects located in mapRect.

Discussion This method offers a fast way to retrieve the annotation objects in a particular portion of the map. This method is much faster than doing a linear search of the objects in the annotations property yourself.

This suggests that the NKMapView already organizes annotations in a spatial index structure. Would this method meet your needs?

If not, I would look for existing open source implementations of any 2D spatial indexing structure and pick the one with best documentation, cleanest interfaces, etc. rather than worrying about efficiency. If you need to write the code form scratch, I think a quadtree would be the easiest to implement. On the other hand, the Wikipedia article on R-tree seems more specifically targeted towards mapping than the K-D Tree or Quadtree.

The main problem with this method is, that it's for retrieval only and I want distance-based filtering before adding locations to the map. The implementation is not a problem, but I don't want to implement something I don't exactly need :) — Templar, Oct 04 '12 at 07:05

What spatial indexing algorithm should I use?

2 Answers2