I am trying to understand the basics of RTree algorithm and I am trying to figure out how it performs the search of e.g. all retaurants within 1 km. We would have all objects stores in rectangles in our database, we would then (prbably) build a query rectangle, based on our current position, and then find all rectangles that overlap with it. WOuld we then scan through the results to find the ones of interest i.e. only objects which are restaurants?
1 Answers
Yes, this is basically how range queries on R-trees work: if a rectangle overlaps with your query region, expand it (look at the contents, rectangles or points). Otherwise, ignore it. Overlap testing is simple for rectangle-to-rectangle, and for spherical queries you need to compute the minimum distance of the sphere center to the rectangle ("minDist").
k nearest neighbor queries are a bit more tricky; here you need priority queues. Always expand the best candidate (by "minDist"), until you have found k objects that are closer than the next rectangles "minDist".
Since you can't really index the "is a restaurant" property, you'll have to either build an r-tree containing restaurants only, or filter the results by the restaurant property. (This also is how it is done e.g. in SQLite; the spatial part is indexed with an R-tree, while the restaurant property is e.g. obtained via a join or a bitmap index)
The tricky part of an R-tree is not the query, but how to build it. There are very simple but good methods for bulk loading point data (STR), but for an online database you need somewhat tricky methods. R*-trees outperform classic R-trees significantly in my experience; the reinsertions used by R*-trees are in particular tricky to implement in a real DBMS. An interesting tradeoff is to just use insert and split from R*, but not the reinsertions. On the query side, there is no difference between R and R* anyway.
kd-trees: They are related to r-trees, but have some key differences: first of all, they are not designed for disk storage, but in-memory operation only. Secondly, they are not meant to be updated (they are not balanced trees), but if you have changes you will have to rebuild them again every now and then to keep the performance good. So in some cases they will perform very well (and they are fairly simple to implement), but once you get to large data and on-disk they are much more painful. Furthermore, they do not allow for different loading strategies.

- 76,138
- 12
- 138
- 194
-
Just a note but to say that there are more the 4 hilbert curves is non-sense. The basic shape of a hilbert curve has only 4 directions. Maybe you mean the moore curve? – Micromega Aug 21 '12 at 19:12
-
Even in 2D, you can start in each corner, and then go either X first or Y first. That already makes 8 curves. And to many, the Moore curve is a Hilbert curve, too, since it consists of the same primitives. But I'm mostly interested in higher dimensionality anyway, and there the number goes up even more, because you can use different permutations of the axes in each recursion step. – Has QUIT--Anony-Mousse Aug 22 '12 at 08:29
-
You have a really strange way of counting and showing people things. – Micromega Aug 22 '12 at 09:13
-
4 corners *times* 2 axis permutations = 8 isomorphic variants (+ slight layout variations such as moore). So no counting involved here, but multiplication. – Has QUIT--Anony-Mousse Aug 22 '12 at 11:03
-
Does it satisfy the triangle inequality? – Micromega Aug 22 '12 at 11:12
-
There is not triangle inequality here. – Has QUIT--Anony-Mousse Aug 22 '12 at 11:34
-
How? It's about distances? The OP asked about distances? – Micromega Aug 22 '12 at 11:41
-
The R-tree does not use the triangle inequality. (The M-tree does!) It solely needs a point-to-rectangle lower bound. – Has QUIT--Anony-Mousse Aug 22 '12 at 12:50
-
Nice. But do you have an example of the 8 hilbert curves? Can you point me to some internet address? I don't understand what you mean with 4 corners and 2 axis permutations = 8 isomorphic variants? Or do you have some code where you can show me how this works? IMO the hilbert curve is u-shape and the only thing you can do it rotating. And there is only 4 directions? How is this axis permutations work? – Micromega Aug 22 '12 at 15:02
-
First choose a corner. You have 4 choices, right? Then choose whether to go X or Y first, and draw the hilbert curve as you would usually do. You have 8 different curves this way, although two will *look* the same way, unless you draw arrows. – Has QUIT--Anony-Mousse Aug 22 '12 at 19:14
-
That's makes 4 curves and not 8 (or 6). Search for orientation hilbert curves. – Micromega Aug 22 '12 at 19:31
-
Prove otherwise. 4 times 2 is 8, and if you take direction into account - which you have to for indexing - it gives 8. – Has QUIT--Anony-Mousse Aug 22 '12 at 19:58
-
Yes, can be. Here are some other variations: https://docs.google.com/viewer?a=v&q=cache:n5mRSIv3qssJ:www4.ncsu.edu/~njrose/pdfFiles/HilbertCurve.pdf+&hl=de&gl=de&pid=bl&srcid=ADGEEShNxWRHjjQTuIaBpQor_SbMjHxz-LMCBGjQAXeaKvdBx6x_4jqDL51dB28UwKp6IyJZf8ZIUbZyjvV0IxqBJFC9HifagvZ49Mu98VjDl4uBHpkUmbzWxUuGyXbhteWnI2nLFsqN&sig=AHIEtbTh6B3m4TI88RGM_EjtYk2MjL0Qgg – Micromega Aug 22 '12 at 20:20
-
Yes, that article has some of the variations that I was referring to. I believe that many of them have the same "locality preserving" properties as the "default" hilbert curve. – Has QUIT--Anony-Mousse Aug 23 '12 at 10:04
-
In fact I wrote this in my answer but you downvoted it. The hilbert curve is locality preserving and reducing the dimensions. – Micromega Aug 23 '12 at 13:23
-
I never disagreed with that, but this is as much a reply as "the factorial function can be computed using recursion". That is also a correct fact, but not a reply to "how does an Rtree work". – Has QUIT--Anony-Mousse Aug 23 '12 at 16:51
-
You lost me. Did you read nick's spatial index hilbert quadtree blog? IMO it's a very good answer maybe not for r-trees? – Micromega Aug 23 '12 at 16:54
-
Exactly. Hilbert curves are okay. Quadtrees are okay. There are even Hilbert R-Trees. But it doesn't *help* when the *question* is "how do I perform a range query on an R-tree". It's was an okay answer, but to a different question. – Has QUIT--Anony-Mousse Aug 23 '12 at 17:12
-
Hmm. What's the magic about a range query on r-tree? You say there is also hilbert r-tree? Isn't the range in an r-tree not calculated with a space filling curve? I've some problems how you define exactly or perfect? – Micromega Aug 23 '12 at 17:44
-
There is no magic. It's just about whether the rectangle overlaps the query region (rectangular or spherical). And no, unless you are using a Hilbert R-tree, no SFC is used, and no triangle inequality either. – Has QUIT--Anony-Mousse Aug 24 '12 at 07:04
-
And how can be a restaurant a rectangular shape? I thought OP is looking for restaurant in an area subdivide by rectangles or squares? Looking for overlapping rectangles doesn't solve OP problems? – Micromega Aug 24 '12 at 07:13
-
The rectangles are the pages of the tree. How much do you know about r-trees? It's a well defined data structure, introduced by Guttman. With a particular query algorithm... – Has QUIT--Anony-Mousse Aug 24 '12 at 12:52
-
Not much. I just know space filling curves. I make trees up to 1-26 leafs but I didn't make a quadtree or a r-tree yet. – Micromega Aug 24 '12 at 12:54
-
@Anony-Mousse what is wrong with kd-tree that the tree is not balanced? why does the tree needs to be balanced? – Dejell Jun 23 '13 at 18:25
-
Only balanced trees have the desired depth of log n. Have a look at how to insert into an existing k-d-tree to understand the problems. – Has QUIT--Anony-Mousse Jun 25 '13 at 20:50