Internal working of python heapq merge. How does it sort a list without generating the list

Question

How does heapq.merge() sort a list even without generate the list?

Not sure if I stated clear.
So, this is raised from the Super Ugly Number problem at leetcode.

And this python code

class Solution(object):
    def nthSuperUglyNumber(self, n, primes):
        """
        :type n: int
        :type primes: List[int]
        :rtype: int
        """
        uglies = [1]
        def gen(prime):
            for ugly in uglies:
                yield ugly * prime
        merged = heapq.merge(*map(gen, primes))
        while len(uglies) < n:
            ugly = next(merged)
            if ugly != uglies[-1]:
                uglies.append(ugly)
        return uglies[-1]

gave me a hard time understanding it. After I searched the concepts of "yield" and "heapq", I still don't get that in the while loop, how merged know that ugly in uglies>n will not be smaller than uglies[n-1].

score 3 · Answer 1 · answered Feb 13 '16 at 10:00

3

The implementation of heapq.merge is pure Python, you can read its code directly if you want.

As you might guess from the module it's implemented in, it uses a heap to merge the iterables it's passed. If the iterables (generators in this case) each yield their values in order, it will combine them so that the values it yields are also in order. It doesn't eliminate duplicate values, which is why the code you show checks to see if the latest value is equal to the previous one.

answered Feb 13 '16 at 10:00

Blckknght

100,903
11
120
169

Maybe it was in 2016, but it's not Python today. It's a thin wrapper around a relatively complex native heap implementation, and you need to read that code to see how it works. – Glenn Maynard Jul 17 '22 at 04:44
@GlennMaynard While some of the `heapq` code does indeed have accelerated C implementations, the `heapq.merge` function remains pure Python even today in the Python 3.11 betas. It does *call* some of the accelerated functions (and did several years ago too), but you can read the pure-python equivalents in the Python module to understand their purpose (the C versions just do the same things faster). The accelerator module `_heapq`'s source is [here](https://github.com/python/cpython/blob/main/Modules/_heapqmodule.c). – Blckknght Jul 17 '22 at 05:26

Mike Graham · Answer 2 · 2016-02-13T16:28:08.737

0

It takes n already sorted iterables. It can then look at the smallest value in each of those and use that. The smallest one is always the first item, then the second item, then the third item, since they are each sorted.

edited Feb 13 '16 at 16:28

answered Feb 13 '16 at 06:07

Mike Graham

73,987
14
101
130

It's important to note that it actually works on *iterables*, not necessarily sequences. In the code in the question, the iterables to be merged are in fact infinite generators. – Blckknght Feb 13 '16 at 10:01
Thanks, I shouldn't SO at 1:30 in the morning ;) – Mike Graham Feb 13 '16 at 16:28
Thanks for answering!!! I guess my problem with that is how does the sequence generated by the generator know that 1001st value, which it doesn't really generate, has to be larger than the previous 1000 value. – Ye Zhang Feb 13 '16 at 22:24
@YeZhang, I'm not positive I understand the lasting confusion. `heapq.merge` knows because you promised it, and it would have bugs if you ended up lying to it and giving it non-sorted arguments. – Mike Graham Feb 13 '16 at 23:44
@MikeGraham Hi, I'm not talking about the primes list. I think I didn't quit understand how this works. For example, let's use the primes = [2, 7, 13, 19]. According to the code, let's try n=6, shouldn't the code generate a list of [1,2,4,7,13,19]? It must be something I missed, but I just can't figure out by myself:( – Ye Zhang Feb 15 '16 at 20:12

Internal working of python heapq merge. How does it sort a list without generating the list

2 Answers2

Linked