-2

Following the docs of heapq.merge() - I get very strange results, and cannot find what am I doing wrong... The setup is as follows:

  1. I am using heapq.merge() to sort multiple lists. Tested with 2 ~ 8 list iterators, and the results are exactly the same. The lists contain 10K ~ 25K items.
  2. The lists elements themselves implement all that is required for the lists to be sorted (__ lt__(), __ eq__(), ...).
  3. I tested that these special sorting methods are called, both when sorting the lists themselves, and while the heapq.merge() method is called.
  4. I made sure that the lists do not contain any duplicate entry. Not even cross list. Using a simple running number which I appended to each element, and which is used in the comparison.

The output: While iterating through 2 lists with 25K items each, I got 100K results. Double the amount put in.

I believe I followed all the requirements here. Should I heapify the lists before entering them into the heapq.merge? It is not said so in the docs and it is not clear how should/if it be done.

Any clue?

rubmz
  • 1,947
  • 5
  • 27
  • 49
  • "The lists iterators themselves implement all that is required for the lists to be sorted" - that doesn't make sense. List iterators are a built-in type that you can't add methods to, and the iterators aren't the things that need to have comparison methods. – user2357112 Dec 20 '18 at 07:26
  • You don't have to pre-heapify the lists or anything. All we can tell is that you have some sort of bug. We need something we can run that demonstrates duplicates if we're to answer this. – user2357112 Dec 20 '18 at 07:27
  • I made sure each element has __ lt __ and __eq __ methods. So, items may be sorted. Will update question... – rubmz Dec 20 '18 at 07:29
  • I tested once. But will double check this... – rubmz Dec 20 '18 at 07:34

2 Answers2

0

heapq.merge doesn't eliminate duplicates. View here for more information

ycx
  • 3,155
  • 3
  • 14
  • 26
  • Never said it should eliminate duplicates... Actually I made sure there aren't duplicates. Will update question. The problem, that even though there aren't any duplicates - It __emits__ duplicates! – rubmz Dec 20 '18 at 07:25
  • Have you tried the link inside that post: https://hg.python.org/cpython/file/default/Lib/heapq.py#l314 which goes into the exact code detail of the `merge` function? Perhaps you can find your answer from that code. Hope this helps – ycx Dec 20 '18 at 07:27
  • I already went into the heapq.merge code, which just like the link you given - very complicated, and I passed my student years :-) I rather inspect my own bugs than this kind of algorithm. – rubmz Dec 20 '18 at 07:32
0

Okay, Just to make things clear of what happened here: My mistake was to path (carelessly...) the iterators of the lists, and not the lists themselves. Oddly, the function/compiler did not reject it! Once I passed the lists themselves, the function worked fine.

rubmz
  • 1,947
  • 5
  • 27
  • 49