Number of items in an (x,y) interval in (logn)(logn) time

Question

Homework

I need to use a data structure + algorithm that returns the number of elements within a range consisting of 2 (x,y) values (i.e. return the number of elements that fall within a rectangular range on an xy plane) in O(logn*logn).

I am considering two possiblities, kd-tree and a range tree. A kd-tree is well suited for this because it can find elements within the range in O(logn + k) (for k elements it needs to report). But I don't need to report the elements, I simply need to compute the number of elements that are within the range.

A range tree could work in that, I could have a property in each node that holds how many are less than itself. This way, I can determine how many elements are less than a particular value in O(logn) times (by going to the two boundaries and finding the difference in the number of nodes that are less than each other). However, I don't think this will work for data sets that have both an (x,y) dimension.

Am I on the right track?

Make problem specification more clear. Do you have an `N`x`N` grid, each cell of which may contain an "element"? — Mikhail, Jul 30 '14 at 06:23

score 0 · Answer 1 · 2014-07-30T06:27:47.183

What you have described is an online two-dimensional orthogonal range counting problem. "Online" indicates that after pre-processing of the data, queries come one after another. "Orthogonal" indicates that the ranges are axis-align rectangles. And, as opposed to range reporting, range counting only counts the number of items that fall within the range.

A k-d tree with each node storing the total number of nodes under it can perform range counting in O(n^(1-1/k)) in the worst case. This is because any orthogonal range can intersect at most O(n^(1-1/k)) leaves of a k-d tree. In 2-d case, it means that a range counting query can be performed in O(sqrt(n)), which is worse than the required O((log(n))^2).

Your third paragraph does not make sense, as range tree is defined in high-dimensions. In fact, it's the textbook solution to high-dimensional orthogonal range counting problems. It solves online two-dimensional orthogonal range counting in exactly O((log(n))^2) query time. I would recommend you to read one seminal paper, Multidimensional Divide-and-Conquer, written by Jon Louis Bentley, one of the several people independently discovered range tree. The relevant section is 2.1.2.

As this is a homework question, I won't go into the details, but I've probably already said too much.

Number of items in an (x,y) interval in (logn)(logn) time

1 Answers1