4

I'm working on a personal project to do with computational geometry. The question in the title is an abstraction of one of the small subproblems that I am trying, but struggling, to solve efficiently. Hopefully it's general enough to maybe be of use of more than just me!


The problem

Imagine we have a set S of rectangles in the plane, all of which have edges parallel to the coordinate axes (no rotations). For my problem we'll assume that rectangle intersections are very common. But they are also very nice: If two rectangles intersect, we can assume one of them always completely contains the other. So there's no "partial" overlaps.

I want to store these rectangles in a way that:

  • We can efficiently add new rectangles.
  • Given a query point (x,y) we can efficiently report back the rectangle of smallest area that contains the point.

The illustration provides the motivation for the latter. We always want to find the most deeply nested rectangle that contains the query point, so that's always the one of smallest area.

.


My thoughts

So I know that both R-Trees and Quad-Trees are often used for spatial indexing problems, and indeed both can work well in some cases. The problem with R-Trees is that they can degrade to linear performance in the worst case.

I thought about building a set of balanced binary trees based on nestedness. The left subtree of node r contains all the rectangles that are inside rectangle r. The right subtree contains all the rectangles that r is inside of. The illustrated example would have three trees.

But what if none of the rectangles are nested? Then you need O(n) trees of 1 element and again we have something that performs just as poorly as a linear scan through the boxes.


How could I solve this in a way that we have asymptotically sub linear time in the worst case? Even if that means sacrificing some performance in the best cases or storage requirements. (I assume for a problem like this, there may be a need to maintain two data structures and that's cool)

I am certain that the very specific way in which rectangles are allowed to intersect should help make this problem possible. In fact, it looks like a candidate for logarithmic performance to me but I'm just not getting anywhere.

Thanks in advance for any ideas you have!

Jay
  • 121
  • 1
  • 4

4 Answers4

4

I'd suggest storing the rectangles per nesting level, and tackling the rectangle-finding per level. Once you've found which top-level rectangle the point is in, you can then look at the second-level rectangles that are inside that rectangle, find the rectangle the point is in using the same method, then look at the third-level, and so on.

To avoid a worst-case of O(n) to find the rectangle, you could use a sort of ternary spatial tree, where you repeatedly draw a vertical line across the space and divide the rectangles into three groups: those to the left (blue), those intersected by (red), and those to the right (green) of the line. For the group of intersected rectangles (or once a vertical line would intersect most or all of the rectangles), you switch to a horizontal line and divide the rectangles into groups above, intersected by, and below the line.

ternary spatial tree

You would then repeatedly check whether the point is to the left/right or above/below the line, and go on to check the rectangles on the same side and those intersected by the line.

In the example, only four rectangles would actually need to be checked to find which rectangle contains the point.


If we use the following numbering for the rectangles in the example:

rectangle numbering

then the ternary spatial tree would be something like this:

ternary spatial tree

  • If all rectangles cover the entire area, how would this avoid the O(n) worst-case? I don't see a worst-case guarantee of your approach. Your example assumes that they don't overlap, but then the R tree will work pretty well already. – Has QUIT--Anony-Mousse Mar 31 '17 at 19:34
  • @Anony-Mousse I'm not assuming the rectangles aren't nested, I'm just suggesting to store and search them per level. But you're right that my suggestion for a ternary tree only improves searches per level, and if the query point is in every rectangle, they all have to be considered, so still O(n). – m69's been on strike for years Apr 01 '17 at 01:10
1

You can partition the area from xMin to xMax and yMin to yMax along the edges of the rectangles. This gives at most (2n - 1)^2 fields. Each of the fields is either completely empty or occupied by the visible (part of a) single rectangle. Now you can easily create a tree structure with links to the top rectangle (e.g. count the number of partitions in x and y direction, where there are more divide in the middle and create a node... proceed recursively). So the lookup will take O(log n^2) which is sub linear. And the data structure takes O(n^2) space.

Here is a better implementation in terms of complexity, because the search of the indices can be separated the search for the rectangle on top is only O(log n) no matter how the configuration of the rectangles is and fairly simple to implement:

private int[] x, y;
private Rectangle[][] r;

public RectangleFinder(Rectangle[] rectangles) {
    Set<Integer> xPartition = new HashSet<>(), yPartition = new HashSet<>();
    for (int i = 0; i < rectangles.length; i++) {
        xPartition.add(rectangles[i].getX());
        yPartition.add(rectangles[i].getY());
        xPartition.add(rectangles[i].getX() + rectangles[i].getWidth());
        yPartition.add(rectangles[i].getY() + rectangles[i].getHeight());
    }
    x = new int[xPartition.size()];
    y = new int[yPartition.size()];
    r = new Rectangle[x.length][y.length];
    int c = 0;
    for (Iterator<Integer> itr = xPartition.iterator(); itr.hasNext();)
        x[c++] = itr.next();
    c = 0;
    for (Iterator<Integer> itr = yPartition.iterator(); itr.hasNext();)
        y[c++] = itr.next();
    Arrays.sort(x);
    Arrays.sort(y);
    for (int i = 0; i < x.length; i++)
        for (int j = 0; j < y.length; j++)
            r[i][j] = rectangleOnTop(x[i], y[j]);
}

public Rectangle find(int x, int y) {
    return r[getIndex(x, this.x, 0, this.x.length)][getIndex(y, this.y, 0, this.y.length)];
}

private int getIndex(int n, int[] arr, int start, int len) {
    if (len <= 1)
        return start;
    int mid = start + len / 2;
    if (n < arr[mid])
        return getIndex(n, arr, start, len / 2);
    else
        return getIndex(n, arr, mid, len - len / 2);
}
maraca
  • 8,468
  • 3
  • 23
  • 45
1

Pretty much any index can degrade to worst case O(n).

The question is whether you will ever have such harmful data, and whether you optimize for worst case, or for real data.

Consider n identical sized, overlapping rectangles, and a point in the intersection... you won't have much chance for optimization here.

The R tree is a quite good choice. You can do a priority search, and prefer the smaller rectangles.

But your sketches indicate that your rectangles may be usually nested, rather than overlapping. The standard R-tree does not handle this very well. Instead, you may need to modify the R tree to use exactly this structure, and store only the nested rectangles as part of the parent.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • There are no partially overlapping rectangles, as stated in the question. However, the rectangles may all be inside each other, and the query point inside all of them, so your point still stands. – m69's been on strike for years Apr 01 '17 at 01:56
  • If the rectangles are all inside each other it's actually pretty simple to get O(log n). Check if the point is inside the rectangle at middle depth, if yes the point can only be in this or a higher rectangle if no it has to be in a deeper rectangle, repeat recursively. – maraca Apr 01 '17 at 13:58
1

How about a PH-Tree? The PH-Tree is a essentially quadtree shaped like a quadtree, but with some unique properties that may be ideal for your case, such as very efficient updates and a high likelihood of locality of small rectangles.

Basics:

  • The PH-Tree is a bit-level try, that means it splits in all dimensions at every bit position. This means, for 64bit floating point data, the maximum depth of the tree is 64.
  • The tree is implicitly z-ordered
  • Query speed is usually comparable to R*Tree or STR-Tree, for your case it may be considerably faster, see below.
  • Insertion/deletion speed is equal to or better that STR-Trees and better than any other R-Tree type that I am aware of.
  • Tree shape is determined only by the data, not by insertion order. That means there will never be any costly rebalancing. In fact, the tree guarantees that any insertion or deletion will never affect more than two nodes (with child/parent relationship).

Storing rectangles: The PH-Tree can only store vectors of data, ie. points. In order to store (axis aligned) rectangles, it takes by default the 'lower left' and 'upper right' corner and but these into single vector. For example, a 2D rectangle (2,2)-(4,5) is stored as a 4-dim vector (2,2,4,5). It may not be obvious, but this representation still allows for efficient queries, such as window queries and nearest neighbor queries, see some results here and some more explanation here.

The tree cannot directly store the same rectangle twice. Instead you would have associate a counter with each 'key'. For the special case with 'n' identical rectangles, this actually has the advantage that the resulting tree would contain only one key, so overlap with the smallest rectangle could be determined in almost constant time.

Query performance: As can be seen from the performance results, the PH-Tree is (depending on the dataset) fastest with small query windows that return few results (here, Figure 16). I'm not sure whether the performance benefit is connected to the small query window size or the small result size. But if it is connected to the first, then your queries should be very fast, because essentially your query window is a point.

Optimising for small rectangle size: Due to the encoding of rectangles into a single vector, the smallest rectangle is likely (guaranteed??) to be in the same leaf node that would also contain your search point. Usually, queries are traversed in z-order, so to exploit locality of small rectangles, you would need to write a special query. This should not be hard, I think I could simply use the PH-Tree k-nearest-neighbor implementation and provide a custom distance function. The current kNN starts with locating the node with the search point and then extends the search area until it found all nearest neighbors. I do believe that using a custom distance function should be sufficient, but you may have to do some research to prove it.

The complete code (Java) of the PH-Tree is available in the link above. For comparison, you may want to check out my other index implementations here (R*Tree, quadtrees, STR-Tree).

TilmannZ
  • 1,784
  • 11
  • 18
  • Yes, that is me. I noticed a while ago that the location of rectangles in the tree depends partially on their size. For example, nodes that are close to the spatial diagonal (0,0,0...)/(MAX,MAX,MAX,...) are all small. The inverse is only partially true, rectangles with maximum distance from the diagonal can be small or large, depending on the quadrant. I never had time to look further into this, let alone finding a use case. If you are interested I can provide some more details about this behavior. – TilmannZ Apr 03 '17 at 18:37