-1

I am trying to implement a custom min-heap in python. I want to store tuples in the heap and order them with respect to the second element in the tuple. Here is my attempt at it:-

class minheap():
  def __init__(self):
      self.heap = list()
  def parent(self,i):
    return (i-1)//2
  def left_child(self,i):
    return 2*i+1
  def right_child(self,i):
    return 2*i+2

  def insert(self,l,i):
    h = self.heap
    h.append((l,i))
    j=len(h)-1
    while j>0:
      if h[self.parent(j)][1] > h[j][1]:
        j = self.parent(j)
      else:
        h[self.parent(j)] , h[j] = h[j] , h[self.parent(j)]
        break
    return
  
  def delmin(self):
    h = self.heap
    h[0], h[-1] = h[-1], h[0]
    m = h.pop(-1)
    j=0
    while True:
      if self.left_child(j) > len(h)-1:
        break
      if h[self.left_child(j)][1] < h[j][1]:
        h[self.left_child(j)], h[j] = h[j], h[self.left_child(j)]
        j = self.left_child(j)
      if self.right_child(j) > len(h)-1:
        break
      elif h[self.right_child(j)][1] < h[j][1]:
        h[self.right_child(j)], h[j] = h[j], h[self.right_child(j)]
        j = self.right_child(j)
      else:
        break
    return m

I have tested the time it takes for insertion and deletion and it seems like each of them have a complexity of O(logn). These are the times I have recorded for each of them and n signifies list(range(n)) converted to a heap:-

Insert

Size(n) Time
10000 0.01562
100000 0.06099
1000000 0.62161

Delmin

Size(n) Time
10000 0.02703
100000 0.15629
1000000 1.22780

I have also used heapq functions to create a heap although of numbers only, and the time taken by them is much lower.

heappush

Size(n) Time
10000 0.0
100000 0.00800
1000000 0.08599

heappop

Size(n) Time
10000 0.00199
100000 0.02472
1000000 0.28575

Could you please tell me why the custom functions are much slower than the functions that come in the default package?

Is adding tuples instead of numbers the main factor slowing the process or is there some problem with my algorithm that is causing a problem here?

trincot
  • 317,000
  • 35
  • 244
  • 286
Sharan Kumar
  • 139
  • 1
  • 2
  • 13
  • Try comparing apples to apples. How fast is yours if you use the same "numbers only"? – Kelly Bundy Feb 03 '22 at 17:01
  • The standard `heapq` module is written entirely in Python, you can look at it to see what they might be doing differently from you. – jasonharper Feb 03 '22 at 17:03
  • Using just numbers would also make it easier to test. Your insert function looks wrong to me. – Kelly Bundy Feb 03 '22 at 17:09
  • @jasonharper If you scroll to the bottom of the Python version, you'll see it replaces stuff with the C version. – Kelly Bundy Feb 03 '22 at 17:37
  • @KellyBundy The C implementation has been added since the last time I looked at the `heapq` code. How dare they make such an improvement without personally informing me :) (I think this definitely answers the question at hand, though.) – jasonharper Feb 03 '22 at 17:42
  • @jasonharper So you last looked eight years ago? :-) Yes, using C is probably one of the reasons. This one using functions even for child/parent calculations is another. Using tuples instead of numbers is yet another. All that said, this code is indeed wrong, so I'd say it doesn't even matter how fast it is at being wrong. – Kelly Bundy Feb 03 '22 at 17:53
  • @KellyBundy Thanks for the help. I have tested with numbers only insert and there isn't any appreciable difference in time when inserting tuples. Could you also tell me what is wrong with the insert function if possible? – Sharan Kumar Feb 04 '22 at 04:21

1 Answers1

1

There is nothing wrong with the complexity of your algorithm (note the appoximate factor of 10 of increased time when operating on 1e6 instead of 1e5 values). The standard library functions are just faster by a constant factor. That is probably because they are optimized and maybe even written in a compiled language, that can run much faster.

Jakob Stark
  • 3,346
  • 6
  • 22