In 3.x, zip
returns a special sort of iterator, not a list. The documentation explains:
zip()
is lazy: The elements won’t be processed until the iterable is iterated on, e.g. by a for loop or by wrapping in a list
.
This entails that it can't be indexed, so old code that attempts to index or slice the result of a zip
will fail with a TypeError
. Simply passing the result to list
produces a list, which can be used as it was in 2.x.
It also entails that iterating over the zip
result a second time will not find any elements. Thus, if the data needs to be reused, create a list once and reuse the list - trying to create it again will make an empty list:
>>> example = zip('flying', 'circus')
>>> list(example)
[('f', 'c'), ('l', 'i'), ('y', 'r'), ('i', 'c'), ('n', 'u'), ('g', 's')]
>>> list(example)
[]
This iterator is implemented as an instance of a class...
>>> example = zip('flying', 'circus')
>>> example
<zip object at 0x7f76d8365540>
>>> type(example)
<class 'zip'>
>>> type(zip)
<class 'type'>
... which is built-in:
>>> class example(int, zip): pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> # and that isn't caused by __slots__ either:
>>> zip.__slots__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'zip' has no attribute '__slots__'
(See also: TypeError: multiple bases have instance lay-out conflict, Cannot inherit from multiple classes defining __slots__?)
The key advantage of this is that it saves memory, and allows for short-circuiting when the inputs are also lazy. For example, corresponding lines of two large input text files can be zip
ped together and iterated, without reading the entire files into memory:
with open('foo.txt') as f, open('bar.txt') as g:
for foo_line, bar_line in zip(f, g):
print(f'{foo_line:.38} {bar_line:.38}')
if foo_line == bar_line:
print('^ found the first match ^'.center(78))
break