Overwriting an iterator using itertools.tee

Question

I would like to use itertools.tee inside of a function, with the original iterator as an argument, but I am concerned that I may be reusing the old iterator when I exit the function, which one is not supposed to do when using tee.

If I call tee in the same block as the iterator, then it seems safe:

my_iter = create_some_iterator()

my_iter, my_lookahead = itertools.tee(my_iter)

because the original iterator pointed to by my_iter has (I assume) no more reference counts and my_iter now points to its duplicate, so there's no way to use the original iterator.

But is this still true if I pass it through a function?

def foo(some_iter):
    some_iter, some_lookahead = itertools.tee(some_iter)
    # Do some lookahead tasks

my_iter = create_some_iterator()
foo(my_iter)
next(my_iter)   # Which iter is this?

Does my_iter point to the copy of my_iter after leaving the function? Or does it still point to the original iterator, which I am not supposed to use?

I am concerned because most of the time this is not a problem, but there are occasions where I have been caught by this, particularly in less common implementations like PyPy.

This is what id tells me in the example above, which suggests that I cannot use iterators in this way, but I may also be misinterpreting what id means here:

import itertools

def foo(some_iter):
    print('  some_iter id:', id(some_iter))
    some_iter, some_lookahead = itertools.tee(some_iter)

    print('  new some_iter id:', id(some_iter))
    print('  some_lookahead id:', id(some_lookahead))
    # Do some lookahead tasks

my_iter = iter(range(10))
print('my_iter id:', id(my_iter))
foo(my_iter)

print('my_iter id after foo:', id(my_iter))

Output:

my_iter id: 139686651427120
  some_iter id: 139686651427120
  new some_iter id: 139686650411776
  some_lookahead id: 139686650411712
my_iter id after foo: 139686651427120

my_iter still has its original id, not the one assigned to some_iter by tee.

UPDATE: Sorry, this was not the question I meant to ask. I more or less answer it myself in the second part.

I was more asking why it still seems to work as expected, with iterations in the copy are reflected in the original, even though they have different IDs.

Also was half-trying to ask how to handle this problem but this answer provides a solution.

I tried to scale back the question, but scaled it back too much.

I tried to close this question, but it won't let me anymore, so not sure how to handle this. Apologies to those who already answered.

score 0 · Answer 1 · answered Mar 27 '20 at 15:00

Does my_iter point to the copy of my_iter after leaving the function? Or does it still point to the original iterator, which I am not supposed to use?

It still points to the original. Python is a "pass by value" language (though all its values are references so it's a bit confusing sometimes). It is not a pass-by-reference language, assigning to a parameter is purely local to a function and invisible from the caller.

score 0 · Answer 2 · answered Mar 27 '20 at 15:03

In Python, passing something to a function never makes a copy.

def identity(x):
    return x

iter1 = iter(range(10))
iter2 = identity(iter1)
assert iter1 is iter2  # assert "object equality"

Python functions always work this way.

The behavior of itertools.tee is irrelevant. The fact that we are dealing with iterators specifically is irrelevant.

The fact that itertools.tee returns a copy, is behavior specific to itertools.tee.

If you have some free time, this talk can be very enlightening: https://www.youtube.com/watch?v=_AEJHKGk9ns

Overwriting an iterator using itertools.tee

2 Answers2