0

As zip yields as many values as the shortest iterable given, I would have expected passing zero arguments to zip to return an iterable yielding infinitely many tuples, instead of returning an empty iterable.

This would have been consistent with how other monoidal operations behave:

>>> sum([])                            # sum
0
>>> math.prod([])                      # product
1
>>> all([])                            # logical conjunction
True
>>> any([])                            # logical disjunction
False
>>> list(itertools.product())          # Cartesian product
[()]

For each of these operations, the value returned when given no arguments the identity value for the operation, which is to say, one that does not modify the result when included in the operation:

  • sum(xs) == sum([*xs, 0]) == sum([*xs, sum()])
  • math.prod(xs) == math.prod([*xs, 1]) == math.prod([*xs, math.prod()])
  • all(xs) == all([*xs, True]) == all([*xs, all()])
  • any(xs) == any([*xs, False]) == any([*xs, any()])

Or at least, one that gives a trivially isomorphic result:

  • itertools.product(*xs, itertools.product())
    itertools.product(*xs, [()])
    (*x, ()) for x in itertools.product(*xs)

In the case of zip, this would have been:

zip(*xs, zip())f(x) for x in zip(*xs)

Because zip returns an n-tuple when given n arguments, it follows that zip() with 0 arguments must yield 0-tuples, i.e. (). This forces f to return (*x, ()) and therefore zip() to be equivalent to itertools.repeat(()). Another, more general law is:

((*x, *y) for x, y in zip(zip(*xs), zip(*ys))zip(*xs, *ys)

which would have then held for all xs and ys, including when either xs or ys is empty (and does hold for itertools.product).

Yielding empty tuples indefinitely is also the behaviour that falls out of this straightforward reimplementation:

def my_zip(*iterables):
    iterables = tuple(map(iter, iterables))
    while True:
        item = []
        for it in iterables:
            try:
                item.append(next(it))
            except StopIteration:
                return
        yield tuple(item)

which means that the case of zip with no arguments must have been specifically special-cased not to do that.

Why is zip() not equivalent to itertools.repeat(()) despite all the above?

user3840170
  • 26,597
  • 4
  • 30
  • 62
Hui
  • 571
  • 1
  • 3
  • 9
  • 6
    Why would you expect that? `zip` stops when the shortest iterator ends. If there are no iterators, then the shortest one was of zero length, so the output will be zero length. – Tim Roberts Mar 21 '22 at 17:30
  • 1
    @TimRoberts It is useful to make any "product"-like operation to return "identity" element if passed by zero argument. For example, the identify element for logical AND is `True`, so `all([])` would return `True` – Hui Mar 21 '22 at 17:38
  • @TimRoberts, another example, the "identity" element for cartesian product operation is a range of exact one element of "void", so `list(itertools.product())` returns a range of exact one element of an empty tuple – Hui Mar 21 '22 at 17:41
  • I think this is off-topic for Stackoverflow but might fit on Software Engineering? See https://meta.stackoverflow.com/questions/276366/are-questions-about-programming-history-in-scope-for-stack-overflow – Stuart Mar 21 '22 at 17:41
  • 2
    @TimRoberts "If there are no iterators, then the shortest one was of zero length" - if there are no iterators then there is no shortest one and we can say nothing about its length, so this assertion makes no sense. But I see no reason for the OP's assertion that the length should be infinite either. – Stuart Mar 21 '22 at 17:46
  • 1
    See https://meta.stackoverflow.com/questions/323334/is-asking-why-on-language-specifications-still-considered-as-primarily-opinio and https://meta.stackoverflow.com/questions/260711/should-why-language-feature-designed-particular-way-be-closed-moved. – Stuart Mar 21 '22 at 18:08
  • 1
    Think about the logic inside `zip`. At each step, `zip` asks, "do any of my iterators have anything else to produce?" When given no iterators, clearly the answer to that question is "no". – Tim Roberts Mar 21 '22 at 18:08
  • 1
    @Stuart, I think this is more of a math question which clearly has an answer. – Hui Mar 21 '22 at 18:17
  • @Stuart, I edited the question with examples. it is not about the design of the language, it is about the correctness of the math behind it. – Hui Mar 21 '22 at 18:21
  • 3
    I think this question *is* about the design/history of the language, but it's interesting and I'd like it answered. Another way to phrase it: "Mathematically, `f()`-with-no-args should always return `f`'s identity element... so when learning `zip` you might _think_ that `zip()`-with-no-args will return its identity element which is `repeat(())`... but in fact _it does not_. What's a good way to explain this design choice to a student of Python? How does it fit in with the rest of the language?" One possible answer is "it's just a bug," but that sounds like an extraordinary claim. – Quuxplusone Mar 21 '22 at 18:34
  • If I understand correctly, there is no valid identity for `zip`. `zip(repeat(()), range(3))` yields `((), 0), ((), 1), ((), 2))`, which is clearly different from `zip(range(3))`. The same is true for `product`: `product([()], x)` does not yield the same as `product(x)`. It's not obvious whether the Python designers purposely used the identity for the other behaviors you list. Also what is the justification that the identity (if it existed) would be infinite in length? Like an identity matrix or zero vector, wouldn't it only be defined for a specific size? – Stuart Mar 21 '22 at 21:43
  • @Stuart as I stated in the product’s behaviour, product/zip does not return the same type as original input, the result of an empty product can only be `product(identity)`, where identity of product is exact one element of nothing, thus `product(identity)` is exact one element of empty tuple. similarly `zip()` in theory can be `zip(identity)` where identity of zip is an infinite range of nothing, thus `zip(identity)` is an infinite range of empty tuple – Hui Mar 22 '22 at 00:55
  • @TimRoberts But the question `zip` asks is actually ‘is any of my iterators exhausted yet’, which is never true when there are no iterators. – user3840170 Aug 20 '22 at 12:11
  • @TimRoberts no, if there are no iterators, the shortest length among them is not 0. Exactly as the minimum of an empty subset of natural numbers (when that has a definition) is never defined as 0, but as infinity instead. – jthulhu Aug 20 '22 at 13:19
  • @BlackBeans -- Python is not attempting to implement set theory. As Stuart describes in his excellent answer, Python is implementing the "principle of least astonishment". To a practical programmer, zipping nothing should result in nothing. – Tim Roberts Aug 20 '22 at 19:06
  • @TimRoberts The same principle of least astonishment would have `0 % 2` throw an exception because [common folk cannot agree on the parity of zero](//en.wikipedia.org/wiki/Parity_of_zero). – user3840170 Aug 21 '22 at 09:20
  • No, that's a poor example. The first two sentences tell the story of that claim. – Tim Roberts Aug 21 '22 at 23:37

1 Answers1

2

PEP 201 and related discussion show that zip() with no arguments originally raised an exception. It was changed to return an empty list because this is more convenient for some cases of zip(*s) where s turns out to be an empty list. No consideration was given to what might be the 'identity', which in any case appears difficult to define with respect to zip - there is nothing you can zip with arbitrary x that will return x.

The original reasons for certain commutative and associative mathematical functions applied to an empty list to return the identity by default are not clear, but may have been driven by convenience, principle of least astonishment, and the history of earlier languages like Perl or ABC. Explicit reference to the concept of mathematical identity is rarely if ever made (see e.g. Reason for "all" and "any" result on empty lists). So there is no reason to rely on functions in general to do this. In many cases it would be less surprising for them to raise an exception instead.

Stuart
  • 9,597
  • 1
  • 21
  • 30