0

I was implementing a heap sort and I start wondering about the different implementations of heaps. When you don need to access the elements by index(like in a heap sort) what are the pros and cons of implementing a heap with an array or doing it like any other linked data structure.

I think it's important to take into account the memory wasted by the nodes and pointers vs the memory wasted by empty spaces in an array, as well as the time it takes to add or remove elements when you have to resize the array.

When I should use each one and why?

Topo
  • 4,783
  • 9
  • 48
  • 70
  • I would think an array would be the only implementation you would want to consider, due to constant-time index-based access. Linked lists cannot do this. – Merlyn Morgan-Graham Jun 27 '11 at 08:53
  • I know that for a heap sort you need the array because of the index, but how about any other use for a heap, like if you use it like a priority queue. In this case I thing that for high values of n (being n the number of elements), the space wasted in the empty slots of an array represent a higher waste than the memory occupied by the nodes and pointers of the linked heap. – Topo Jun 27 '11 at 09:03
  • 1
    I think I get the direction of your question now; it isn't about "should I use arrays or linked lists to implement heap sort", it is "in different situations, when would it be better to use a linked list for a heap rather than an array". The way you phrased your question, it looks like you are only asking about heap sort implementations. – Merlyn Morgan-Graham Jun 27 '11 at 09:13
  • @Merlyn Yes, that's what I meant. I get this doubt while implementing a heap sort but I want to know in which uses or under which conditions I should use the linked o the array implementation of the heap. – Topo Jun 27 '11 at 09:21

1 Answers1

1

As far as space is concerned, there's very little issue with using arrays if you know how much is going into the heap ahead of time -- your values in the heap can always be pointers to the larger structures. This may afford for better cache localization on the heap itself, but you're still going to have to go out someplace to memory for extra data. Ideally, if your comparison is based on a small morsel of data (often just a 4 byte float or integer) you can store that as the key with a pointer to the full data and achieve good cache coherency.

Heap sorts are already not particularly good on cache hits throughout traversing the heap structure itself, however. For small heaps that fit entirely in L1/L2 cache, it's not really so bad. However, as you start hitting main memory performance will dive bomb. Usually this isn't an issue, but if it is, merge sort is your savior.

The larger problem comes in when you want a heap of undetermined size. However, this still isn't so bad, even with arrays. Anymore, in non-embedded environments with nice, pretty memory systems growing an array with some calls (e.g. realloc, please forgive my C background) really isn't all that slow because the data may not need to physically move in memory -- just some address pointer magic for most of it. Added to the fact that if you use a array-size-doubling strategy (array is too small, double the size in a realloc call) you're still ending up with an O(n) amortized cost with relatively few reallocs and at most double wasted space -- but hey, you'd get that with linked lists anyways if you're using a 32-bit key and 32-bit pointer.

So, in short, I'd stick with arrays for the smaller base data structures. When the heap goes away, so do the pointers I don't need anymore with a single deallocation. However, it's easier to read pointer-based code for heaps in my opinion since dealing with the indexing magic isn't quite as straightforward. If performance and memory aren't a concern, I'd recommend that to anyone in a heartbeat.

Kaganar
  • 6,540
  • 2
  • 26
  • 59