What data structure should I use for these operations?

Question

I need a data structure that that stores a subset—call it S—of {1, . . . , n} (n given initially) and supports just these operations:

• Initially: n is given, S = {1, . . . , n} at the beginning.

• delete(i): Delete i from S. If i isn't in S already, no effect.

• pred(i): Return the predecessor in S of i. This means max{j ∈ S | j < i}, the greatest element in S that is strictly less than i. If there is none, return 0. The parameter i is guaranteed to be in {1, . . . , n}, but may or may not be in S.

For example, if n = 7 and S = {1, 3, 6, 7}, then pred(1) returns 0, pred(2) and pred(3) return 1.

I need to figure out:

a data structure that represents S
an algorithm for initialization (O(n) time)
an algorithm for delete (O(α(n)) amortized time)
an algorithm for pred (O(α(n)) amortized time)

Would appreciate any help (I don't need code - just the algorithms).

In other words, a dynamic array right? How would the algorithms for delete and pred give O(α(n)) amortized time in this case? — , Jul 11 '17 at 04:18
If n is relatively small you could just use a bit set, which does not meet the pred time bound though. — Henry, Jul 11 '17 at 04:19
Btw. what does amortized mean for pred? A pred operation does not change the set. Do we know something about the mix of pred and delete calls? — Henry, Jul 11 '17 at 04:28
Let's say n = 7, S = {1, 3, 6, 7}, and for simplicity I drew out a table with i = 1, 2, 3, 4, 5, 6, 7. Then pred(i), in order, would be 0, 1, 1, 3, 3, 3, 6. Now, let's say we do delete(3). S = {1, 3, 7} now and if we call pred(i) on the table again we get 0, 1, 1, 1, 1, 1, 6. I don't know how much this might help but that was some additional information I was given. — , Jul 11 '17 at 04:37
Using such a table would make pred run in O(1). However maintenance of the table could use up to O(n^2) in total if you delete the elements in the order n, n-1, ..., 1. This gives only O(n) amortized time for delete operation. — Henry, Jul 11 '17 at 04:58

DAle · Accepted Answer · 2017-07-11T15:40:55.257

You can use Disjoint-set data structure.

Let's represent our subset as disjoint-set. Each element of the disjoint-set is an element of the subset i (including always present zero) unioned with all absent elements in the set that is greater than i and less than next set element.

Example:

n = 10
s = [1, 4, 7, 8], disjoint-set = [{0}, {1,2,3}, {4,5,6}, {7}, {8, 9, 10}]
s = [3, 5, 6, 10], disjoint-set = [{0, 1, 2}, {3, 4}, {5}, {6, 7, 8, 9}, {10}]

Initially, we have a full set that is represented by n+1 disjoint-set elements (with zero included). Usually, every disjoint-set element is a rooted tree, and we can store the leftmost number in the element for every tree root.

Let's leftmost(i) is a leftmost value of a disjoint-set element that contains i.

leftmost(i) operation is similar to Find operation of a disjoint-set. We just go from i to the root of the element and return the leftmost number stored for the root. Complexity: O(α(n))

We can check if i is in the subset comparing i with leftmost(i). If they are equal (and i > 0) then i is in the subset.

pred(i) will be equal to leftmost(i) if i is not in the subset, and equal to leftmost(i-1) if i is in the subset. Complexity: O(α(n))

On every delete(i) operation we check if i is in the subset at first. If i is in the subset we should union an element containing i with the left neighbor element (this is the element that contains i-1). This operation is similar to Union operation of a disjoint-set. The leftmost number of resulting tree will be equal to leftmost(i-1). Complexity: O(α(n))

Edit: I've just noticed "strictly less than i" in the question, changed description a bit.

That's an interesting approach! How would you make sure that the representative always is the smallest element? I would have thought that using some kind of balancing heuristic (union-by-rank/size) was required for the O(α(n)) time complexity, and this could make it difficult if we first deleted the elements in decreasing order (thus always making the set containing the larger elements into the larger set) — Tobias Ribizel, Jul 11 '17 at 09:03
@TobiasRibizel, we'll simply store the minimum in the root during union operation. It is not necessary for smallest element to be the root — DAle, Jul 11 '17 at 09:05
So you store the minima corresponding to the roots in a second array? Now that's a surprising use case for union-find! — Tobias Ribizel, Jul 11 '17 at 09:17

score 1 · Answer 2 · answered Jul 11 '17 at 08:55

I'm not sure if there is a data structure that can guarantee all these properties in O(α(n)) time, but a good start would be predecessor data structures like van Emde Boas trees or y-fast tries

The vEB tree works is defined recursively based on the binary representation of the element indices. Let's assume that n=2^b for some b=2^k

If we have only two elements, store the minimum and maximum
Otherwise, we divide the binary representation of all the elements into the upper and lower b/2 bits.
We build a vEB tree ('summary') for the upper bits of all elements and √n vBE trees for the lower bits (one for every choice of the upper bits). Additionally, we store the minimum and maximum element.

This gives you O(n) space usage and O(log log n) = O(k) time for search, insertion and deletion.
Note however that the constant factors involved might be very large. If your n is representable in 32bit, at least I know of this report by Dementiev et al. breaking the recursion when the problem sizes are solvable more easily with other techniques

The idea of y-fast tries builds on x-fast tries:
They are most simply described as a trie based on the binary representation of its elements, combined with a hash table for every level and some additional pointers.

y-fast tries reduce the space usage by splitting the elements in nearly equally-sized partitions and choosing representatives (maximum) from them, over which an x-fast trie is built. Searches within the partitions are then realized using normal balanced search trees.

The space usage and time complexity are comparable to the vEBs. I'm guessing the constant factors will be a bit smaller than a naïve implementation of vEBs, but the claim is only based on intuition.

A last note: Always keep in mind that log log n < 6, which will probably not change in the near future

score 0 · Answer 3 · answered Jul 11 '17 at 08:58

In terms of providing with an O(α(n)) time, it really becomes tricky. Here is my idea of approaching this:

Since we know the range of i, which is from 1 to n, we can first form a self balancing BST like AVL tree. The nodes of this AVL tree shall be the objects of DataNode. Here is how it might look like:
```
public class DataNode{
int value;
boolean type;
DataNode(int value, boolean type){
    this.value = value;
    this.type = type;

    }
}
```
The value would simply consist of all the values in range 1 to n. The type variable would be assigned value as true if the item we are inserting in the tree is present in the set S. If not, it would be marked as false.

This would take O(n) time for creation. Deletion can be done in O(logn) time. For pred(i), we can achieve average case time complexity to be around O(logn) if I am correct. The algorithm for pred(i) shall be something like this:

Locate the element i in the tree. If type is true, then return the inorder predecessor of this element i if the type value of this predecessor is true.
If it is false, recur for the next predecessor of this element(i.e. predecessor of i-1) until we find an element i whose type = true.
If no such predecessor is found such that type = true, then return 0.

I hope we can optimize this approach further to make it in O(α(n)) for pred(i).

What data structure should I use for these operations?

3 Answers3