Dynamic order statistic: get k-th element in constant time?

Question

So, I'm trying to implement a Data Structure to handle Dynamic Order Statistic. The Data Structure has following operations:

add(x): inserts a new element with value x
get(k): returns the k-th smallest element: k = ceiling(n/a), where n = amount of elements in the data structure and a = constant factor.
reset: resets the whole datastructuer, i.e. the data structure is "empty after it

I implemented my data structure using a balanced AVL tree. Using this the operations have following time complexity:

add(x): O(log(n))
get(k): O(log(n))

Here is my implemenation for the get(k) which uses O(log(n)) time:

public static int get(Node current, int k) {
    int l = tree.sizeLeft(current) + 1;
    if(k == l) {
        return current.value;
    } else if(k < l) {
        if(current.left == null) {
            return current.value;
        }
        return get(current.left, k);
    } else {
        if(current.right == null) {
            return current.value;
        }
        return get(current.right, k);
    }
}

And here's my implementation for the node class:

class Node {
int height, value, bal, size;   // bal = balanceFactor, size = amount of nodes in tree 
                                   rooted at current node
Node leftChild = null;
Node rightChild = null;

public Node(int val) {
    value = val;
    height = 1;
    size = 1; 
}

}

However, my task is to implement a data structure that can handle the above operations and only taking O(1) (constant) time for the operation get(k). (And add(x) still taking O(log(n)) time). Also, I'm not allowed to use a hashmap.

Is it possible to modify my implementation in order to get constant time? Or, what kind of datastructure can handle the get(k) operation in constant time?

To my knowledge, there are only two data structures that allow constant access time of the `k`th element: arrays and Hashmaps (which internally use arrays). — Turing85, Jan 06 '18 at 16:32
Are you sure it's Constant time? cause even hash map cant handle this... I can think of contant time O(1) if add is O(n) — Or251, Jan 06 '18 at 16:50
@Or251 It says: "Your algorithm should require O(log n) time per Add(x) operation, O(1) time per get() operation..." — G.M, Jan 06 '18 at 16:56
You've implemented the standard way of solving this problem. I'm very confident you can't modify the tree algorithm you're using so that `get` is O(1). I've never heard of a dynamic order statistic algorithm that achieves this unless `add` is O(n). I'll be extremely interested to learn if you find one. One note is that your `get` does use O(log n) space unless the compiler removes tail recursion. You can re-implement it with a loop instead. Then it will run on O(1) _space_. — Gene, Jan 06 '18 at 19:48
@Gene please see my answer. Isn't that a get in `O(log a) = O(1)` ? — גלעד ברקן, Jan 06 '18 at 23:39
@גלעד ברק I don't think your proposal solves the problem. Simply splitting the data into a fixed number of trees doesn't change it. There will be O(n/a) = O(n) objects in each tree. Your assertion that the k'th largest element will always be the smallest or largest of some tree doesn't make sense. For example, what if we're looking for the element at position ceiling((2k+1)/2 * n/a) for any integer k=1..., youll always need to go to the middle of a tree, and you'll have to search for it the same way the OP is searching the single tree. — Gene, Jan 07 '18 at 00:25
@Gene OP said `a` is a constant and we're always looking for `k = ceiling(n / a)`. Your example seems invalid. `k` is always `ceiling(n / a)` according to the question. — גלעד ברקן, Jan 07 '18 at 00:36
@גלעד ברק I Hah. He changed the question. The part about `a` is new. So I see what you want to do and think it will work fine. — Gene, Jan 07 '18 at 03:07

SaiBot · Accepted Answer · 2018-01-07T08:50:21.667

As far as I understand the k parameter basically grows with the size of the elements, which means that for each n you know the exact value of k.

If this is the case, then my suggestion is to use a max-heap and and a min-heap. The max-heap organizes elements (smallerequals than the n/a th element) in a heap structure, allowing to access the largest element (root) in constant time. Accordingly the min-heap organizes elements (larger than the n/a th element) in a heap structure, allowing to access the smallest element (root) in constant time.

When new elements arrive (add) you place them in the corresponding heap in O(log n). If the max-heap becomes larger or smaller than (n/a), you rebalance between the two heaps in O(log n)

Your get() function now just needs to return the root element of the max-heap in O(1).

In Java you can use a priority queue for the max-heap (and min-heap)

PriorityQueue<Integer> heap = new PriorityQueue<>(10, Collections.reverseOrder());

The class could look like this

import java.util.Collections;
import java.util.PriorityQueue;

public class DOS
{

    double a;
    PriorityQueue<Integer> heap;
    PriorityQueue<Integer> heap_large_elements;

    public DOS(double a) {
        this.a = a;
        this.heap = new PriorityQueue<>(10, Collections.reverseOrder());
        this.heap_large_elements = new PriorityQueue<>();
    }

    public void add(int x){
        if(heap.size() == 0 || x < heap.peek())
            heap.add(x); // O(log n/a)
        else
            heap_large_elements.add(x); // O(log n)

        //possible rebalance operations
        int n = heap.size() + heap_large_elements.size();
        if(heap.size() > Math.ceil(n/a)){
            heap_large_elements.add(heap.poll()); //O(log n)
        }else if(heap.size() < Math.ceil(n/a)) {
            heap.add(heap_large_elements.poll()); //O(log n)
        }
    }

    public int get(){
        return heap.peek(); //O(1)
    }

    public static void main(String[] args)
    {
        DOS d = new DOS(3);
        d.add(5);d.add(6);d.add(2);d.add(3);d.add(8);d.add(12);d.add(9);
        System.out.println(d.get());
    }

}

Edit (by Cheaty McCheatFace):

Another idea that lets you use your code but is somewhat cheaty, is the following. Whenever, you add an element to your AVL-Tree you calculate the k (=n/a) largest element (as done in your code) and store it. This way, the add()-function still has a O(log n) runtime. The get()-function just retrives the stored value and is in O(1).

k is a parameter to get(), i.e., it's a different k every time — Matt Timmermans, Jan 06 '18 at 18:09
That´s what I first thought too, but "k = n/a, where n = amount of elements in the data structure and a = constant factor" lets me think it is dependent on n. — SaiBot, Jan 06 '18 at 18:10
Ah, you're right. Well, that makes this one pretty easy, but I ask a similar question in interviews so I'm not gonna help :) — Matt Timmermans, Jan 06 '18 at 18:15
I don't know if I did something wrong but I implemented it and tried a small example: add(5), add(6), add(2) and factor a = 3, i.e. k = ceiling(3/3) = 1, so when I call get() it should return 2, shouldn't it? Because I get 6. — G.M, Jan 06 '18 at 18:54
Thats weird, what is the size of the heap after you inserted all elements? You should try calling get() after each insert and check the result. — SaiBot, Jan 06 '18 at 19:20
@SaiBot sorry to interrupt again but I tried the following example: add(5), add(6), add(2), add(3), add(8), add(12), add(9). Then k = ceiling(7/3) = 3, i.e. get() should return 5. However it returns 9. I think the problem is that when you remove the root (in `add()`) you simple "throw away" the number and when the size of the heap is again enough big you just insert the next number which is the parameter when you call add() the next time. However this new number might be larger than one that was previously removed but should now be added. Correct me if I understand something wrong. — G.M, Jan 06 '18 at 23:04
You are correct, good catch. I missed that case. It can be solved using a secon heap for all elements that are removed from the first one and reinsert these elements if necessary. I will work something out tomorrow, when the question is still unanswered. The answer in the edit is still valid though. — SaiBot, Jan 06 '18 at 23:46
@G.M updated the code. It is now using two heaps that might need to rebalance after an add(). I hope I thought off all cases now:) — SaiBot, Jan 07 '18 at 08:45

גלעד ברקן · Answer 2 · 2018-01-07T14:25:00.010

If you'd like to use trees, maintain an order of between 1 and a maximum of a balanced trees. For each tree, keep a pointer to the smallest and largest element, and the tree size. Whenever an element is added, insert it into the appropriate tree. If a tree grows beyond ceiling(n / a) elements, reorganize the trees by moving the appropriate lowest or highest to a neighbouring tree to keep them all between floor(n / a) and ceiling(n / a) elements in size. The kth element will always be either the smallest or highest of one of the trees.

Add would take O(log a + log(n/a) * a) = O(log n) time.
Get would take O(log a) = O(1) time.

Dynamic order statistic: get k-th element in constant time?

2 Answers2