58

I'm learning data structures and every source tells me not to use index 0 of the array while implementing heap, without giving any explanation why. I searched the web, searched StackExchange, and couldn't find an answer.

xji
  • 7,341
  • 4
  • 40
  • 61
  • 3
    I've never heard of not using index 0 in a heap. It slightly changes the arithmetic for calculating indices (left/right child, parent), but it's pretty insignificant. I've implemented heaps several times and never avoided using 0. – Emmet Apr 06 '14 at 22:20
  • 3
    Although the question is old, I checked the following class - org.apache.commons.collections.BinaryHeap and it starts the heap implementation from index 1. – rents Jul 07 '15 at 02:30

4 Answers4

110

There's no reason why a heap implemented in an array has to leave the item at index 0 unused. If you put the root at 0, then the item at array[index] has its children at array[index*2+1] and array[index*2+2]. The node at array[child] has its parent at array[(child-1)/2].

Let's see.

                  root at 0       root at 1
Left child        index*2 + 1     index*2
Right child       index*2 + 2     index*2 + 1
Parent            (index-1)/2     index/2

So having the root at 0 rather than at 1 costs you an extra add to find the left child, and an extra subtraction to find the parent.

For a more general case where it may not be a binary heap, but a 3-heap, 4-heap, etc where there are NUM_CHILDREN children for each node instead of 2 the formulas are:

                  root at 0                  root at 1
Left child        index*NUM_CHILDREN + 1     index*NUM_CHILDREN
Right child       index* NUM_CHILDREN + 2    index*NUM_CHILDREN + 1
Parent            (index-1)/NUM_CHILDREN     index/NUM_CHILDREN

I can't see those few extra instructions making much of a difference in the run time.

For reasons why I think it's wrong to start at 1 in a language that has 0-based arrays, see https://stackoverflow.com/a/49806133/56778 and my blog post But that's the way we've always done it!

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • It would be interesting to see how Java or C++ implement a heap (whether they start at 0 or 1) in their API (IF they provide a heap api in the first place) – rents Jul 07 '15 at 02:24
  • It is actually implemented this way in most places. In languages which support it, such as C or C++, one possibility is to decrease the pointer to the array. Then you cannot directly dereference it, as that possition is not assigned, but you can then dereference the first position of the array with index 1 instead of zero. You are actually turning the array from zero-based to one-based. – Juan Aug 10 '17 at 11:37
  • 3
    @Juan: Are you sure about that? I'm looking at C++ STL code for `priority_queue`, and it's 0-based. I don't know what you consider "most places", but as I recall the Java and Python heap implementations also are 0-based. In practice, the only places I see 1-based heaps are in college student projects, and the few people who roll their own heaps rather than use the provided libraries. – Jim Mischel Aug 10 '17 at 12:47
  • 1
    Sorry @Jim, I wrote it in a way that leads to confusion. I meant that in most places it is indeed 0-based. When I say, implemented "this way" I mean the way you explain in your answer. Appart from that, I consider not a bad idea to decrement the base pointer of the array (or a copy of it) and work with 1-based array. Of course, you cannot do that in Java :) – Juan Aug 11 '17 at 19:22
  • I might be late to the party. My guess with 1-based array, you can find the parent, left child and right child with right shift, left shift and left shift (even number) + 1 respectively. This is the performance boost, the developers are targeting for in the large scale. – Deepak Oct 27 '22 at 10:40
  • @Deepak But then why would all the major libraries' binary implementations use 0-based heaps. C++ standard library. C# PriorityQueue. Java PriorityQueue. Python heapq. They all use 0-based heaps. You'd think they'd happily waste a single array element to get a performance boost. But none of them do. – Jim Mischel Feb 05 '23 at 03:46
29

As I found it in CLRS book, there is some significance in terms of performance, since generally, shift operators work very fast.

On most computers, the LEFT procedure can compute 2*i in one instruction by simply shifting the binary representation of i left by one bit position. Similarly, the RIGHT procedure can quickly compute 2*i+1 by shifting the binary representation of i left by one bit position and then adding in a 1 as the low-order bit. The PARENT procedure can compute i/2 by shifting i right one bit position.

So, starting the heap at index 1 will probably make faster calculation of parent, left and right child indexes.

Kuanysh
  • 414
  • 4
  • 4
  • 4
    That really doesn't matter on any CPU built in the last 20 years. For one accessing any element at all takes hundreds of times longer than the add, thousands if it is a cache miss. Also since the add happens unconditionally it never stalls the pipeline. As for doing shift instead of divide, that might be useful as it frees up execution units but any compiler worth considering knows that `/2` can be replaced by a shift and will do that for you if you write `i/2` – Niklas Schnelle Jun 23 '19 at 19:20
  • 1
    To add to that, if allocations are aligned by default doing `peekMin()` at position 1 instead of 0 could (depending on the datatypes) easily make the access much more expensive than the add. – Niklas Schnelle Jun 23 '19 at 19:23
7

As observed by AnonJ, this is a question of taste rather than technical necessity. One nice thing about starting at 1 rather than 0 is that there's a bijection between binary strings x and the positive integers that maps a binary string x to the positive integer written 1x in binary. The string x gives the path from the root to the indexed node, where 0 means "take the left child", and 1 means "take the right child".

Another consideration is that the otherwise unused "zeroth" location can hold a sentinel with value minus infinity that, on architectures without branch prediction, may mean a non-negligible improvement in running time due to having only one test in the sift up loop rather than two.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
6

(While I was searching, I came up with an answer of my own but I don't know whether it's correct or not.)

If index 0 is used for the root node then subsequent calculations on its children cannot proceed, because we have indexOfLeftChild = indexOfParent * 2 and indexOfRightChild = indexOfParent * 2 + 1. However 0 * 2 = 0 and 0 * 2 + 1 = 1, which cannot represent the parent-children relationship we want. Therefore we have to start at 1 so that the tree, represented by array, complies with the mathematical properties we desire.

Bolo
  • 11,542
  • 7
  • 41
  • 60
xji
  • 7,341
  • 4
  • 40
  • 61
  • 9
    We don't **have to** start at 1, since nothing is forcing us to use those equations as is, but starting at 0 will add a few `-1`s and `+1`s to the equations. – Bernhard Barker Apr 06 '14 at 21:44
  • 3
    @Dukeling OK, so the heap, as defined mathematically(conceptually), should have a root with an index "1"(the whole structure starts at 1). We might choose to implement this root with array[0], but if so we have to do some `+1`, `-1`, which will be a little annoying. So normally we start at array[1]. Am I right in this interpretation? – xji Apr 06 '14 at 21:54