0

I am writing a binary heap over an array arr.

Every node except the leaf nodes have two children.

The root can be at arr[0] or arr[1].

The accepted answer at Why in a heap implemented by array the index 0 is left unused? says arr[1] is faster.

But one comment below that answer says most implementation put root at arr[0].

What are the benefits of putting the root at arr[0]?

R zu
  • 2,034
  • 12
  • 30
  • Saves space? Frankly, the link you posted, which says any performance difference will be minimal, seems to answer your question. –  Apr 12 '18 at 15:47
  • Erh. save space of 1 node to make every operation a bit slower? If that is the case, I guess I will put root at `arr[1]`. – R zu Apr 12 '18 at 15:47
  • 1
    If you want to index from `1`, go write the code in FORTRAN, if you want to write proper C, array indexes start at `0`. – David C. Rankin Apr 12 '18 at 21:58
  • @DavidC.Rankin The funny part is that in those 1-based languages, the generated code is 0-based. The compiler does the index offset for you. – Jim Mischel May 01 '18 at 23:41
  • I guess it was easier for humans to number punch-cards from `1, 2, ..` instead of numbering them `0, 1, ...` so the creators of the language carried that through to indexes as well. It's telling to recall that FORTRAN has been around since engineers were carrying slide-rules in their shirt pockets... – David C. Rankin May 01 '18 at 23:45

1 Answers1

13

I am the person who answered the question you linked.

Creating a binary heap that has the root at arr[1] in a language that has 0-based arrays is idiotic. Not because it wastes a trivial amount of space, but because it creates unnecessarily confusing code for no benefit.

Why is the code confusing? Because it breaks a fundamental rule that we as programmers have been working under for years: arrays start at 0. If you want an array that holds 100 items, you declare it that way:

int a[100];

Except for a binary heap. Because some idiot who converted the original binary heap code from Algol (whose arrays are 1-based) to C (0-based arrays) back in 1973 didn't have the brains to change the child and parent calculations, we've ended up with this one special case where to hold 100 items you have to allocate 101:

int a[101];

And when somebody called that person on the inconsistency, he came up with a specious performance argument.

Yes, there is an extra increment instruction in the code for computing the left child index, and an extra decrement instruction when computing a child's parent index. In the wider context of what a binary heap does, those two instructions will make no practical difference to the running time of any program that uses the heap. None. If the difference is even measurable, it will definitely be noisy. There are many other things happening on your computer that will have much larger effects on your program's performance.

If you're writing a program that requires a high performance priority queue, what the heck are you doing with a binary heap in the first place? If you're really going to store huge numbers of things in your priority queue, you probably should be using something like a Pairing heap, which will outperform binary heap, although at a higher memory cost.

The C++ STL priority_queue, the Java PriorityQueue, and python's heapq all use 0-based binary heaps. The people who wrote those packages understand performance considerations. If there was a significant performance gain to going with a 1-based binary heap, they would have done so. That they went with 0-based heaps should tell you that any performance gain from a 1-based heap is illusory.

See my blog post But that's the way we've always done it! for a more complete discussion.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 1
    I think that captures it nicely. Many algorithms pulled from early languages like Algol or FORTRAN by lazy coders, simply copied and implemented the code without any forethought to proper indexing in C. The "Numerical Recipes in C" book is a prime example. It's enough to drive any principled C programmer batty having to rewrite and reindex what are supposedly C routines. – David C. Rankin Apr 12 '18 at 21:56