Data Structures: If Heaps are trees, why are they implemented internally with Lists or arrays?

Question

I'm giving myself a refresher course in data structures and algorithms (and learning new stuff -- I was an Information Systems major in college instead of Computer Science, so I didn't get a formal education in these things), and I've been working on heaps. I'm a little confused.

My understanding is that a Heap is basically a semi-sorted tree, where the value of each child node is guaranteed to be less than the value of its parent (assume for this discussion MinHeaps). So if it's a tree, why is it that every implementation I've seen has used an array-like structure internally instead of building a set of tree nodes?

It seems weird to me that I have to memorize that the children of N in an array sit at 2N + 1 (left) and 2N + 2 (right)*. Why not just build a node with Left and Right properties and go from there?

*Source: This article

Why store data (the pointers) when you don't have to? Implicitly knowing where the children are is a huge benefit, as is implicit knowledge of where the parent is. — Jim Mischel, Jun 05 '16 at 15:18

Sorin · Accepted Answer · 2019-02-12T09:38:58.633

TL;DR: Save on memory overhead, get more speed from data locality.

For a binary tree you need in each node 4 bytes for left child and 4 byte for the right child (or 8+8 if you are on 64 bit system). That's just the bare pointers you need. If you store a 32 bit int that's a lot of overhead. Add another pointer for the parent that is needed for pushing a node towards the root and you are looking at 24 bytes of overhead for a 4 byte integer on a 64bit system.

For a heap you don't need to worry about arbitrary trees. You usually only worry about the head (the min/max of the values) and you don't care about the internal structure. The heap is an almost complete binary tree (all levels are filled except the last one which is filled left to right). In this structure if you just put the nodes in an array, then for a node at index x you always find the parent at (x+1)/2 the left child at x*2+1 and the right child at x*2+2. So no need to store any of those fat pointers.

In addition to the space saved you also get a speed boost because the memory is contiguous so it's more likely that it will be cached together (not guaranteed, just more likely).

Sure, if it not something where efficiency is important, you can implement it as a regular tree. And conversely, if you have a almost complete tree and you want to the most out of you system implement it with an array (even if you don't use it as a heap).

Also, it's worth noting there are many other "tree" data structures that can be implemented using arrays. Just off the top of my head I can list Binary Indexed Trees (aka Fenwick Trees) and Segment Trees. — Juan Lopes, Jun 04 '16 at 21:31
Reviving this old thread.. So I have the opposite question - why binary search tree is not implemented as array? What property of the binary search tree makes it better to be implemented as a tree? — paranoidAndroid, Aug 17 '20 at 23:22
@paranoidAndroid because of overhead again. If your root only has a left child then at least half of the array will be empty. Another reason is that you can't do rotations efficiently on the array based structure. All AVL or RB tree algorithms would become horribly slow. One last reason is that you might want pointers. Think of a file base or network based tree, you might not load a subtree because it's rarely accessed, so you can have a 'pointer' that says the subtree is in this file or on that machine (not the same as the language pointer, but similar concept). — Sorin, Dec 11 '20 at 08:51

score 2 · Answer 2 · answered Jun 04 '16 at 21:12

First, let's make a little clarification about vocabulary:

A (min-oriented, but I won't precise it every time) priority queue is an abstract data structure which implements the operations add and deleteMin and sometimes decreaseKey. Actually, you can make a simple array/list where you loop through the structure to find the minimum, and you would have implemented a priority queue (a very inefficient one, but still).
A heap is a tree-based data structure that satisfies the heap property (Wikipedia). The heap property is: a parent has a lower key than its childs.
The data structure you are describing is the very common binary heap which is a kind of heap, but not the only one. (Nor the most efficient but that's another story).

The first time I heard of the binary heap, I also thought it was very weird to have a tree in an array, and you'd have to make some strange multiplications to get to the childs/parent.

It's more difficult to represent it in your head, but it makes perfect sense, if you look a little bit closer:

A binary heap is nearly balanced, that is there will never be a hole in the array. (This simple property is awesome in itself, because holes in arrays are a real pain).
It takes less space: an array is MUCH more memory efficient than nodes.
As it is designed, it is quite easy to abstract the array as a binary tree. You can create helpers like getRight(int node), getLeft(int node) and getParent(int node), and the implementation will look more familiar.

However, a binary heap is not cache-efficient, because childs are very far away from the parent, though it may be more cache-efficient than a node-based equivalent binary heap.

Now if you look at the pros and cons, the only con is that the array-based binary heap requires one more step to picture it in one's head, but it's winning everything else.

I don't know if the original heap was designed as an array, but somehow one day someone found this implementation and the array has become the standard for binary heaps.

However, other kind of heaps are implemented with nodes, so it's a special case.

Data Structures: If Heaps are trees, why are they implemented internally with Lists or arrays?

2 Answers2