5

Consider the following list of tuples: [(5,4,5), (6,9,6), (3,8,3), (7,9,8)]

I am trying to devise an algorithm to check whether there exists at least one tuple in the list where all elements of that tuple are greater than or equal to a given tuple (the needle).

For example, for a given tuple (6,5,7), the algorithm should return True as every element in the given tuple is less than the last tuple in the list, i.e. (7,9,8). However, for a given tuple (9,1,9), the algorithm should return False as there is no tuple in the list where each element is greater than the given tuple. In particular, this is due to the second element 1 of the given tuple, which is smaller than the second element of all tuple in the list.

A naive algorithm would loop through the tuple in the list one by one, and loop through the the element of the tuple in the inner loop. Assuming there are n tuples, where each tuple have m elements, this will give a complexity of O(nm).

I am thinking whether it would be possible to have an algorithm to produce the task with a lower complexity. Pre-processing or any fancy data-structure to store the data is allowed!

My original thought was to make use of some variant of binary search, but I can't seem to find a data structure that allow us to not fall back to the naive solution once we have eliminated some tuples based on the first element, which implies that this algorithm could potentially be O(nm) at the end as well.

Thanks!

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
Paul
  • 61
  • 2
  • 3
  • `check whether [at least one tuple in the list has all corresponding elements greater than or equal to a given tuple (needle)` I don't get `as every element in the given tuple is` **`less`** `than the first tuple` - 2nd & last tuple are *strictly non-smaller*. – greybeard Jan 24 '20 at 08:15
  • (@WalterTross I don't think it was *that* kind of typo: Taking the first condition from my above comment as definitive, the first needle should be answered *True*, just not for the 1st, but for the 2nd (& 4th) tuple.) – greybeard Jan 24 '20 at 23:34
  • This problem is called offline dominance reporting in the computational geometry literature. There are good solutions known for small m. – David Eisenstat Jan 28 '20 at 14:29

4 Answers4

3

Consider the 2-tuple version of this problem. Each tuple (x,y) corresponds to an axis-aligned rectangle on the plane with upper right corner at (x,y) and lower right at (-oo,+oo). The collection corresponds to the union of these rectangles. Given a query point (needle), we need only determine if it's in the union. Knowing the boundary is sufficient for this. It's an axis-aligned polyline that's monotonically non-increasing in y with respect to x: a "downward staircase" in the x direction. With any reasonable data structure (e.g. an x-sorted list of points on the polyline), it's simple to make the decision in O(log n) time for n rectangles. It's not hard to see how to construct the polyline in O(n log n) time by inserting rectangles one at a time, each with O(log n) work.

Here's a visualization. The four dots are input tuples. The area left and below the blue line corresponds to "True" return values:

Tuples A, B, C affect the boundary. Tuple D doesn't.

So the question is whether this 2-tuple version generalizes nicely to 3. The union of semi-infinite axis-aligned rectangles becomes a union of rectangular prisms instead. The boundary polyline becomes a 3d surface.

There exist a few common ways to represent problems like this. One is as an octree. Computing the union of octrees is a well-known standard algorithm and fairly efficient. Querying one for membership requires O(log k) time where k is the biggest integer coordinate range contained in it. This is likely to be the simplest option. But octrees can be relatively slow and take a lot of space if the integer domain is big.

Another candidate without these weaknesses is a Binary Space Partition, which can handle arbitrary dimensions. BSPs use (hyper)planes of dimension n-1 to recursively split n-d space. A tree describes the logical relationship of the planes. In this application, you'll need 3 planes per tuple. The intersection of the "True" half-spaces induced by by the planes will be the True semi-infinite prism corresponding to the tuple. Querying a needle is traversing the tree to determine if you're inside any of the prisms. Average case behavior of BSPs is very good, but worst case size of the tree is terrible: O(n) search time over a tree of size O(2^n). In real applications, tricks are used to find BSPs of modest size at creation time, starting with randomizing insertion order.

K-d trees are another tree-based space partitioning scheme that could be adapted to this problem. This will take some work, though, because most presentations of k-d trees are concerned with searching for points, not representing regions. They'd have the same worst case behavior as BSPs.

The other bad news is that these algorithms aren't well-suited to tuples much bigger than 3. Trees quickly become too big. Searching high dimensional spaces is hard and a topic of active research. However, since you didn't say anything about tuple length, I'll stop here.

Gene
  • 46,253
  • 4
  • 58
  • 96
  • From specific data structures to mention, let me plug [binary space partitioning (BSP) tree](https://en.m.wikipedia.org/wiki/Binary_space_partitioning) (worst case less likely and not quite as disappointing as with octtree, but (problem) space partitioning more difficult to take advantage) and [kd-tree](https://en.m.wikipedia.org/wiki/Kd-tree) (somewhere in between). – greybeard Jan 25 '20 at 06:50
  • For pre-checks, one might consider "inscribed" convex hulls as well as conventional ones. – greybeard Jan 25 '20 at 06:52
  • @greybeard Thanks. Don't know why I didn't think of BSPs. I've actually implemented one in 3d. – Gene Jan 25 '20 at 21:14
1

This kind of problem is addressed by spatial indexing systems. There are many data structures that allow your query to be executed efficiently.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
0

Let S be a topologically-sorted copy of the original set of n each m-tuples. Then we can use binary search for any test tuple in S, at a cost of O(m ln n) per search (due to at most lg n search plies with at most m comparisons per ply).

Note, suppose there exist tuples P, Q in S such that P ≤ Q (that is, no element of Q is smaller than the corresponding element of P). Then tuple Q can be removed from S. In practice this often might cut the size of S to a small multiple of m, which would give O(m ln m) performance; but in the worst case, will provide no reduction at all.

James Waldby - jwpat7
  • 8,593
  • 2
  • 22
  • 37
  • 2
    Thanks for your answer jwpat7! I looked into topological sorting but I don't think I understand your first part of the answer completely, would you mind elaborating your answer a bit? For example, if there aren't any definite order between the tuples after being topologically-sorted, how does the binary search work for this case? For example, if the list = [<2,1,2>, <1,2,0>, <1,1,3>], how should we employ the binary search here for the test tuple <1,2,1>? – Paul Jun 16 '13 at 10:14
  • I don't think binary searching a non-total order promising. Given [(1, 6), (3, 4), (2, 5)], how many linearisations are there? Which part of the list can I ignore looking for a key of, say, (4, 3)? – greybeard Jan 26 '20 at 07:58
0

Trying to answer
allcorrespondingelements greater than or equal to a given tuple (needle)
(using y and z for members of the set/hay stack, x for the query tuple/needle and x ll y when xₐ ≤ yₐ for all ₐ (x dominated by y))

  • compute telling summary information like min, sum and max of all tuple elements
  • order criteria by selectivity
  • weed out dominated tuples
  • build a k-d-tree
  • top off with lower and upper bounding boxes:
    one tuple lower consisting of the minimum values for each element (if lower dominates x return True)
    and upper consisting of the minimum values: return False if x dominates upper
greybeard
  • 2,249
  • 8
  • 30
  • 66
  • I would not have thought it useful to look beyond [spatial indexing as suggested by Ted Hopp](https://stackoverflow.com/a/17034789/3789665). This is (just?) a summary of ideas from other answers and comments. – greybeard Jan 26 '20 at 08:29