1

I posted quite confusing question, so I rewrote it from scratch...

This is actually purely theoretical question.

Say, we have binary heap. Let the heap be a MaxHeap, so root node has the biggest value and every node has bigger value than it's children. We can do some common low-level operations on this heap: "Swap two nodes", "compare two nodes".

Using those low-level operation, we can implement usual higher level recursive operations: "sift-up", "sift-down".

Using those sift-up and sift-downs, we can implement "insert", "repair" and "update". I am interested in the "update" function. Let's assume that I already have the position of the node to be changed. Therefore, update function is very simple:

function update (node_position, new_value){
    heap[node_position] = new_value;
    sift_up(node_position);
    sift_down(node_position);
}

My question is: Is it (mathematicaly) possible, to make more advanced "update" function, that could update more nodes at once, in a way, that all nodes change their values to new_values, and after that, their position is corrected? Something like this:

function double_update (node1_pos, node2_pos, node1_newVal, node2_newVal){
    heap[node1_pos] = node1_newVal;
    heap[node2_pos] = node2_newVal;
    sift_up(node1_position);
    sift_down(node1_position);
    sift_up(node2_position);
    sift_down(node2_position);
}

I did some tests this with this "double_update" and it worked, although it doesn't prove anything.

What about "triple updates", and so on...

I did some other tests with "multi updates", where I changed values of all nodes and then called { sift-up(); sift-down(); } once for each of them in random order. This didn't work, but the result wasn't far from correct.

I know this doesn't sound useful, but I am interested in the theory behind it. And if I make it work, I actually do have one use for it.

Charles
  • 50,943
  • 13
  • 104
  • 142
enrey
  • 1,621
  • 1
  • 15
  • 29

2 Answers2

1

It's definitely possible to do this, but if you're planning on changing a large number of keys in a binary heap, you might want to look at other heap structures like the Fibonacci heap or the pairing heap which can do this much faster than the binary heap. Changing k keys in a binary heap with n nodes takes O(k log n) time, while in a Fibonacci heap it takes time O(k). This is asymptotically optimal, since you can't even touch k nodes without doing at least Ω(k) work.

Another thing to consider is that if you change more than Ω(n / log n) keys at once, you are going to do at least Ω(n) work. In that case, it's probably faster to implement updates by just rebuilding the heap from scratch in Θ(n) time using the standard heapify algorithm.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • I'm building the heap with heapify algorithm. Most of the time, I am gonna update exactly two nodes at once. It could be done by calling update() twice = O(2*log n). But it's a simulation script, where the actual value of the node will be changed randomly many, many times, before node position correction is needed. I could separate position correction from actual updating of value, if only I was always dealing with only one node at a time, which isn't this case. I am actually looking for a way to separate position correction from value change while dealing with 2 nodes (usually) at a time. – enrey Dec 09 '12 at 17:49
  • Btw I know it's do-able, I can always add separation layer between node value and heap, making "fake nodes" and updating them only when needed, to get rid of that useless heap updates overheads. But I am more interested in math tricks and funky algorithms :) – enrey Dec 09 '12 at 17:53
1

Here's a trick and possibly funky algorithm, for some definition of funky:

(Lots of stuff left out, just to give the idea):

template<typename T> class pseudoHeap {
  private:
    using iterator = typename vector<T>::iterator;
    iterator max_node;
    vector<T> heap;
    bool heapified;

    void find_max() {
      max_node = std::max_element(heap.begin(), heap.end());
    }

  public:
    void update(iterator node, T new_val) {
      if (node == max_node) {
          if (new_val < *max_node) {
               heapified = false;
               *max_node = new_val;
               find_max();
           } else {
               *max_node = new_val;
           }
      } else {
          if (new_val > *max_node) max_node = new_val;
          *node = new_val;
          heapified = false;
     }

    T& front() { return &*max_node; }

    void pop_front() {
        if (!heapified) {
            std::iter_swap(vector.end() - 1, max_node);
            std::make_heap(vector.begin(), vector.end() - 1);
            heapified = true;
        } else {
            std::pop_heap(vector.begin(), vector.end());
        }
    }        
};

Keeping a heap is expensive. If you do n updates before you start popping the heap, you've done the same amount of work as just sorting the vector when you need it to be sorted (O(n log n)). If it's useful to know what the maximum value is all the time, then there is some reason to keep a heap, but if the maximum value is no more likely to be modified than any other value, then you can keep the maximum value always handy at amortized cost O(1) (that is, 1/n times it costs O(n) and the rest of the time it's O(1). That's what the above code does, but it might be even better to be lazy about computing the max as well, making front() amortized O(1) instead of constant O(1). Depends on your requirements.

As yet another alternative, if the modifications normally don't cause the values to move very far, just do a simple "find the new home and rotate the subvector" loop, which although it's O(n) instead of O(log n), is still faster on short moves because the constant is smaller.

In other words, don't use priority heaps unless you're constantly required to find the top k values. When there are lots of modifications between reads, there is usually a better approach.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Maybe I shouldn't have said funky, but you actually brought me to two important understandings: Better not rely on academical constructs, and I shouldn't spend time searching perfect algorithm that might not exist. So I got rid of that heap altogether and when I see how it performs in real situation, I'll try templatetypedef's suggestion using pairing heap, together with your code, that seems like cute simple magic. I don't know which answer to choose now, both gave me good info. Thank you. – enrey Dec 23 '12 at 17:20