3

I noticed some weird behaviour when I copy.copy an itertools.chain:

from copy import copy
from itertools import chain

When I exhaust one of them the result is as expected:

>>> a = chain([1,2,3], [4,5,6])
>>> b = copy(a)
>>> list(a), list(b)
([1, 2, 3, 4, 5, 6], [])

>>> a, b = chain_and_copy()
>>> list(b), list(a)
([1, 2, 3, 4, 5, 6], [])

However when I use next the results seem odd:

>>> a = chain([1,2,3], [4,5,6])
>>> b = copy(a)
>>> next(a), list(b), list(a)
(1, [4, 5, 6], [2, 3])   # b "jumps" to the second iterable...

>>> a = chain([1,2,3], [4,5,6])
>>> next(a)
1
>>> b = copy(a)
>>> next(a), next(b), next(a)
(2, 3, 4)
>>> next(b)   # b is empty
StopIteration:
>>> next(a)   # a is not empty
5

Is that a Bug or is shallow copying an iterator generally a bad idea? I noticed that a copy of iter and a copy of zip behave differently too:

>>> a = zip([1,2,3], [4,5,6])
>>> b = copy(a)
>>> next(a), next(b)
((1, 4), (2, 5))  # copies share the same "position"

>>> a = iter([1,2,3])
>>> b = copy(a)
>>> next(a), next(b)
(1, 1)   # copies don't share the same "position"
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • use `tee` to "copy" your generator: http://stackoverflow.com/questions/21315207/deep-copying-a-generator-in-python – Jean-François Fabre Mar 23 '17 at 21:33
  • also http://stackoverflow.com/questions/3826746/pythonic-way-of-copying-an-iterable-object – Jean-François Fabre Mar 23 '17 at 21:37
  • @Jean-FrançoisFabre Actually I expected something that works like a shallow copy of `zip` (shares position between iterators). That's not really achievable using `itertools.tee`, `copy.deepcopy` or `pickle`. If I wanted a deep copy I wouldn't have used a shallow copy operation `copy.copy` to begin with :) – MSeifert Mar 23 '17 at 21:37
  • 1
    I didn't close the question because I had a doubt and you know a lot about python & SO already. But I wouldn't copy an iterator. I read some Q&A (cannot find it now) which explained that it's bound to fail. iterators can be a lot of different things. Some have the `__next__` attribute, some haven't... I'm no expert but I feel that it's XY problem :) – Jean-François Fabre Mar 23 '17 at 21:38
  • Well, the question is about the root-cause for a bug I was trying to debug and fix. So yes, maybe an XY problem because the answer seems to be "[...] I wouldn't copy an iterator. I read some Q&A (cannot find it now) which explained that it's bound to fail.". – MSeifert Mar 23 '17 at 21:48
  • not a lot of people can answer that one. good luck (my short answer to "Is that a Bug or is shallow copying an iterator generally a bad idea?" is yes BTW :)) – Jean-François Fabre Mar 23 '17 at 21:49

1 Answers1

3

You are just confused by miss using the nested iterables and simple iterables.

Regarding the copy and the your first example you just need to use deepcopy in order to create a proper copy of your iterable:

In [87]: a = chain([1,2,3], [4,5,6])

In [88]: b = deepcopy(a)

In [89]: list(a)
Out[89]: [1, 2, 3, 4, 5, 6]

In [90]: list(b)
Out[90]: [1, 2, 3, 4, 5, 6]

And there is no special thing about next too. Here is the equivalent of chain function from python documentation:

def chain(*iterables):
    # chain('ABC', 'DEF') --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

As you can see, the first for is looping over the iterables which in this case are [1,2,3] and [4,5,6] so if you just copy the generator object and actually create a shallow copy of it, each call to the next in first place will consume one of the iterables, then it iterates over the iterable items. So when you call the next(a) it already consumed the first iterable, and that's why list(b) returns the [4, 5, 6].

And again if you use deepcopy you won't see this behavior anymore.

In [94]: a = chain([1,2,3], [4,5,6])

In [95]: b = deepcopy(a)

In [96]: next(a), list(b), list(a)
Out[96]: (1, [1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6])

This is also true for zip, since you are passing multiple iterable to the function. And if you use deepcopy you'll end up with different objects:

In [100]: a = zip([1,2,3], [4,5,6])

In [101]: b = deepcopy(a)

In [102]: next(a), next(b)
Out[102]: ((1, 4), (1, 4))

But the copy works fine for iter since you're just passing one iterable to the function and there is no need to deepcopy.

After all, the best (most pythonic) way for copying a generator is using itertools.tee:

In [103]: from itertools import tee

In [104]: a = zip([1,2,3], [4,5,6])

In [105]: a, b = tee(a)

In [106]: list(a)
Out[106]: [(1, 4), (2, 5), (3, 6)]

In [107]: list(b)
Out[107]: [(1, 4), (2, 5), (3, 6)]

In [108]: 

In [108]: a = chain([1,2,3], [4,5,6])

In [109]: a, b = tee(a)

In [110]: list(a)
Out[110]: [1, 2, 3, 4, 5, 6]

In [111]: list(b)
Out[111]: [1, 2, 3, 4, 5, 6]
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Thank you for the answer. It makes sense, however I don't get the point about passing in multiple iterables. It may be splitting hairs but I pass in the iterables as themselves `chain(a, b)` or `zip(a, b)` so why would I expect a shallow copy not to copy the iterators themselves? I wouldn't expect it if I passed them in as data-structure like `chain([a, b])` or `zip([a, b])` (these don't work obviously, just to illustrate the point). However python-generators are a completly different topic because they can't be copied anyway. But am I right that you mean "don't shallow copy iterators!"? – MSeifert Mar 23 '17 at 22:05
  • As a matter of fact `a, b` is a tuple. But is has nothing to do with the way you should copy the iterator that's because of the way that `copy` works. I'm also not saying that don't shallow copy the iterators. Functions like `chain` and `zip` which iterates over the iterables first can not be copy properly by using the `copy` function, like a nested list. – Mazdak Mar 23 '17 at 22:16
  • @MSeifert Nevertheless, if you're not satisfied yet, you should check out the `copy` and `deepcopy`'s source to see how they actually work. I'm not exactly sure how the `copy` deals with iterator object. – Mazdak Mar 23 '17 at 22:19