0

I was trying to understand how to convert a list to a min-heap. I did my own simple recursive implementation which seems to work, but I was curious to understand how heapq.heapify works. Thinking of the representation of an array as a heap, the strategy is to ensure the heap invariant is satisfied at all indices that are not leaves. This justifies the first part of the implementation:

def heapify(x):
    for i in reversed(range(n//2)):
        _siftup(x, i)

So we expect that _siftup(x, i) should make sure the heap invariant is satisfied at index i.

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

This seems to first move up the smallest child of any the node we are starting with until the value of the node is a leaf (_siftup). Then it moves the parents of this leaf down to find the correct position for the value of the leaf in the heap (_siftdown). Why is this more "efficient" versus the naive recursive implementation below? The heapq library mentions this is the case, but does not explain why.

def build_min_heap(arr: List[int]):

    n = len(arr)

    for i in reversed(range(n//2)):
        _make_heap_invariant(arr, i)

def _make_heap_invariant(arr: List[int], i: int):

    n = len(arr)
    l_child_idx = 2*i + 1 if 2*i + 1 < n else None
    r_child_idx = 2*i + 2 if 2*i + 2 < n else None
    if l_child_idx and arr[i] >= arr[l_child_idx]:
        arr[i], arr[l_child_idx] = arr[l_child_idx], arr[i]
        _make_heap_invariant(arr, l_child_idx)
    if r_child_idx and arr[i] >= arr[r_child_idx]:
        arr[i], arr[r_child_idx] = arr[r_child_idx], arr[i]
        _make_heap_invariant(arr, r_child_idx)
Deathcrush
  • 77
  • 7
  • Maybe since this is a more computer science-y question, and not a question about a bug or issue with any code, https://cs.stackexchange.com/ would be more appropriate? – Random Davis Dec 29 '20 at 19:10

1 Answers1

0

Most computer science textbooks should have an article on heap. The basic invariant is that for any index i in the heap,

  • heap[i] ≤ heap[2 * i + 1] if 2 * i + 1 is a legal index, and
  • heap[i] ≤ heap[2 * i + 2] if 2 * i + 2 is a legal legal indexes.

These two conditions guarantee that heap[0] is the smallest element in the heap.

Deleting the smallest element from the heap and adding an element to the heap can each be done in O(log n) time.

Frank Yellin
  • 9,127
  • 1
  • 12
  • 22