2

Among the best-known features of functional programming are lazy evaluation and infinite lists. In Python, one generally implements these features with generators. But one of the precepts of functional programming is immutability, and generators are not immutable. Just the opposite. Every time one calls next() on a generator, it changes its internal state.

A possible work-around would be to copy a generator before calling next() on it. That works for some generators such as count(). (Perhaps count() is not generator?)

from itertools import count

count_gen = count()
count_gen_copy = copy(count_gen)
print(next(count_gen), next(count_gen), next(count_gen))  # => 0 1 2
print(next(count_gen_copy), next(count_gen_copy), next(count_gen_copy))  # => 0 1 2

But if I define my own generator, e.g., my_count(), I can't copy it.

def my_count(n=0):
    while True:
        yield n
        n += 1


my_count_gen = my_count()
my_count_gen_copy = copy(my_count_gen)
print(next(my_count_gen), next(my_count_gen), next(my_count_gen))
print(next(my_count_gen_copy), next(my_count_gen_copy), next(my_count_gen_copy))

I get an error message when I attempt to execute copy(my_count_gen): TypeError: can't pickle generator objects.

Is there a way around this, or is there some other approach?

Perhaps another way to ask this is: what is copy() copying when it copies copy_gen?

Thanks.

P.S. If I use __iter__() rather than copy(), the __iter__() version acts like the original.

my_count_gen = my_count()
my_count_gen_i = my_count_gen.__iter__()
print(next(my_count_gen), next(my_count_gen), next(my_count_gen))  # => 0 1 2
print(next(my_count_gen_i), next(my_count_gen_i), next(my_count_gen_i))  # => 3 4 5
RussAbbott
  • 2,660
  • 4
  • 24
  • 37
  • 1
    `count()` is not a generator. If you think something is a generator, but it's copyable, then it's not a generator. – user2357112 Sep 14 '20 at 22:24
  • 4
    You've been misled by people using sloppy terminology to talk about *iterators*. This is one of the reasons correct terminology matters. – user2357112 Sep 14 '20 at 22:26
  • You can actually try using `tee` from itertools. Remember to replace the "original" iterator with one of the resulting copies. – aragaer Sep 14 '20 at 22:52
  • 1
    Not all function languages use lazy evaluation, nor is lazy evaluation restricted to functional languages. – chepner Sep 14 '20 at 22:54
  • Thanks. `itertools.tee` is a clever solution. I hadn't thought of that. @aragaer makes an important point. When calling `tee` it's important that one of the returned iterators replaces the original and that one never calls `next` on the original object from then on. – RussAbbott Sep 16 '20 at 00:01

2 Answers2

2

There's no way to copy arbitrary generators in Python. The operation just doesn't make sense. A generator could depend on all sorts of other uncopyable resources, like file handles, database connections, locks, worker processes, etc. If a generator is holding a lock and you copied it, what would happen to the lock? If a generator is in the middle of a database transaction and you copy it, what would happen to the transaction?

The things you thought were copyable generators aren't generators at all. They're instances of other iterator classes. If you want to write your own iterator class, you can:

class MyCount:
    def __init__(self, n=0):
        self._n = n
    def __iter__(self):
        return self
    def __next__(self):
        retval = self._n
        self._n += 1
        return retval

Some iterators you write that way might even be reasonably copyable. Others, copy.copy will do something completely unreasonable and useless.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I assume what the OP wants and expects is something like Clojure's lazy sequences -- where holding a reference to an instance of the sequence that isn't the most recent item to exist forces all items between the oldest referenced state and the most recent state to be retained in memory for later replay. Given this primitive, it's easy to run out of memory by holding a reference to the head of an infinite sequence... but for purposes of interactions with stateful components (file handles, sockets, etc), it's only the latest point in the sequence to ever be calculated that matters. – Charles Duffy Sep 14 '20 at 22:53
  • ...AFAICT, `itertools.tee` is pretty close to that behavior. – Charles Duffy Sep 14 '20 at 22:57
  • As I understand it `itertools.tee` works on arbitrary iterators. Since generators are iterators, it should work on generators as well. Right? The issue of non-local resources isn't relevant as long as one is willing to have all the `tee`ed copies `yield` the same sequence--since that seems to be all that's going on. The `yield`ed results are cached for use of the other `tee`ed copies. Or am I missing something? – RussAbbott Sep 16 '20 at 00:32
  • @RussAbbott: You can tee any iterator, and if caching the results is really what you want rather than making a copy, then teeing makes sense. – user2357112 Sep 16 '20 at 00:37
2

While copy doesn't make sense on a generator, you can effectively "copy" an iterator so that you can iterate it many times. The easiest way is to use tee from the itertools module.

def my_count(n=0):
    while True:
        yield n
        n += 1

a, b, c = itertools.tee(my_count(), 3)

# now use a, b, c ...

This uses memory to cache the iterator's results and pass them on.

bj0
  • 7,893
  • 5
  • 38
  • 49