My data structure needs three operations:
- insert an element at a random place in the ordering
- find and remove smallest element
- (rarely) delete an element by via some key returned at insert time
The existing code is a single-linked list and does a linear search to find an insert point. O(n).
Finding and removing the smallest element is trivial: pull off and dispose of the head link. O(1).
The insert returns a pointer to the link, and the delete call gets that pointer. Were it a double-linked list the link could simply be deleted. O(1). Alas the list is single-linked, and the list is searched for the node of this address, so it's O(n). This search is expensive, but it does allow detection of an attempt to remove a node twice in some cases: attempted deletion of a node simply not on the list won't find it so won't do anything except generate a warning in the log. On the other hand the nodes are stored in a LIFO memory pool, so are likely to be reused, so an accidental re-deletion of a node may well remove some other node instead.)
OK, with a heap, the insert is O(log n). Delete of minimum is O(log n). Both simple.
But what of delete-by-key? If I keep the heap in an array, it's basically a linear search, O(n). I move the elements around in the heap to keep the heap property (bubbling down and up as needed), so I can't just use the node's address. Plus, unless you accept a fixed maximum size, you need to reallocate the array which typically moves it.
I'm thinking maybe the heap could be an array of POINTERS to the actual nodes, which live elsewhere. Each node would have it's array index in it, and as I move pointers-to-nodes around in the heap, I'd update the node with its new array index. Thus a request to delete a node could supply me with the node. I use the node's stored index into the heap, and delete that pointer, so now log(N). It just seems far more complicated.
Given the extra overhead of allocating non-moving nodes separately, and keeping their array index field updated, sounds like it might be more than some very occasional number of linear searches. OTOH, an advantage of keeping nodes separate from the array heap is that it's faster to swap pointers than whole nodes (which in my case may be 32 bytes or more).
Any simpler ideas?