2

I have a Receiver object that will sometimes accumulate a queue of packets that will be consumed when processed. It seems reasonable to make this receiver have an iterator protocol, so next( receiver ) will manually retrieve the next packet, and for packet in receiver will iterate through the currently available packets. Intuitively, it would be fine to do such an iteration once, going through all the available packets until the receiver stops the for loop by raising StopIteration (which is the standard way for iterators to tell for loops it's time to stop), and then later use such a for loop again to go through whatever new packets have arrived in the interim.

However, Python docs say:

Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

Even though this code is supposedly "deemed broken" it works just fine, as far as I can tell. So I'm wondering how bad is it for me to have code that seemingly works fine, and in the way one would intuitively expect an iterator to be able to work, but is somehow "deemed broken"? Is there something that's actually broken about returning more items after you've raised StopIteration? Is there some reason I should change this?

(I recognize that I could make the receiver be a mere iterable (whose __iter__ method would produce some other iterator) rather than an iterator itself (with its own __next__ method), but (a) this wouldn't support the familiar intuitive use of next( receiver ) to pop the next packet off the queue, and (b) it seems wasteful and inefficient to repeatedly spawn new iterator objects when I already have a perfectly fine iterator-like object whose only fault is that it is apparently "deemed broken", and (c) it would be misleading to present the receiver as a sort of iterable container since the receiver consumes the packets as it retrieves them (behavior built into the C-library that I'm making a Python wrapper for, and I don't think it makes sense for me to start caching them in Python too), so if somebody tried to make multiple iterators to traverse the receiver's queue at their own pace, the iterators would steal items from each other and yield much more confusing results than anything that I can see arising from my presenting this as a single stop-and-go iterator rather than as an iterable container.)

Jonas
  • 121,568
  • 97
  • 310
  • 388
JustinFisher
  • 607
  • 1
  • 7
  • 10
  • 3
    I'd say it's one of those you-have-been-warned situations. Yes, resuming iterability does *something*, and it might even do what you want, and it might even do that without any errors or side effects. But it also goes against how the iterator protocol is expected to behave, and that *may* have weird side effects in *some* situations (which you may not have encountered yet and may never encounter). – deceze Nov 28 '21 at 12:32
  • 1
    https://bytes.com/topic/python/answers/839784-why-broken-iterators-broken (feel free to use in an answer). – Kelly Bundy Nov 28 '21 at 21:53
  • 1
    [Good answers](https://stackoverflow.com/questions/51862462/is-the-file-object-iterator-broken) (including a better link to the mail archive). – Kelly Bundy Nov 28 '21 at 22:10
  • 1
    Questions of the form "why doesn't a feature on language X do what I think it should do instead of what X's designers thought it should do" are basically away of writing an opinion in the form of a question, and the answers tend to also be opinions. All this is contrary to [SO guidelines](https://stackoverflow.blog/2010/09/29/good-subjective-bad-subjective/) – rici Nov 28 '21 at 22:13
  • It isn't just a matter of opinion to ask whether there's some hidden danger in using code that seems to work fine, but also seems to go against the official docs. (If the question was, "Wouldn't it be cool if Python changed this?" that would be a matter of opinion. But that's not what the OP asked.) – JustinFisher Nov 28 '21 at 22:21
  • 2
    The first `get_first_and_last_elements` [here](https://bugs.python.org/issue23455) is a good example of a danger. – Kelly Bundy Nov 28 '21 at 22:26
  • That sort of behavior should be expected from a stop-and-go iterator, so I'd view it more as "working as intended" rather than a "danger", at least if the documentation for the iterator object in question makes clear that's what to expect. I'm more worried that maybe stop-and-go iterators work only in certain implementations of Python, or can fail if a "stopped" iterator gets swept up by garbage collection, or something like that... – JustinFisher Nov 28 '21 at 22:44
  • "this wouldn't support the familiar intuitive use of `next( receiver )` to pop the next packet off the queue" - that's *not* a familiar, intuitive use of `next`. None of Python's stdlib queues use `next` like that, including `collections.deque`, `queue.Queue`, and `multiprocessing.Queue`. – user2357112 Nov 30 '21 at 04:49
  • 1
    "it seems wasteful and inefficient to repeatedly spawn new iterator objects" - creating iterator objects is not going to be the bottleneck. You're prematurely optimizing, and targeting the wrong things to optimize. – user2357112 Nov 30 '21 at 04:51
  • 1
    "That sort of behavior should be expected from a stop-and-go iterator" - it might be expected *by anything specifically expecting your weird iterator*, but it's *not* expected by typical code that takes iterators. By making this design decision, you are making your iterator implicitly unsafe to use with anything not specifically written with your iterator in mind. This leads to a lot of reinventing the wheel, and a lot of subtle bugs when you forget this unsafety and pass your iterator to code that expects iterators to obey the general contract of the iterator protocol. – user2357112 Nov 30 '21 at 06:19
  • @user2357112supportsMonica, By your reasoning it would be "premature optimization" to change `for i in [1]: print(i)` to `print(1)`. But I don't think this is "premature optimization" -- instead I think it's obviously just better code not to spawn an extra iterator when you don't need one. As far as I can tell, the very same reasoning applies to the OP. Worse, what what the OP currently has is analogous to `print(1)` and you're effectively lobbying to "prematurely *de*-optimize" it by wrapping it in a needless iterator! – JustinFisher Dec 01 '21 at 02:51
  • @user2357112supportsMonica The various queues you mention are all iterable in the sense that it makes sense to start multiple iterators that can each traverse the queue at their own pace. In contrast, the OP's receiver consumes the packets as they're iterated, so setting up multiple iterators for it would lead to destructive interference between them. Rather than likening the Receiver to a queue, it'd be better to liken it to the basic Python mechanism for returning items one by one with no going back and no starting over: the iterator! – JustinFisher Dec 01 '21 at 02:59
  • @JustinFisher: " The various queues you mention are all iterable in the sense that it makes sense to start multiple iterators that can each traverse the queue at their own pace." - nope. `collections.deque` is like that, but `queue.Queue` and `multiprocessing.Queue` are closely analogous to your queue - the only supported way to read elements from the queue is to remove them from the queue. – user2357112 Dec 01 '21 at 03:02
  • Incidentally, the danger that @KellyBundy mentioned above, (one way of pre-perusing an iterable to see its first and last elements will produce surprising results if new contents arrive mid-process) isn't likely to arise for the OP's receiver iterator, because (a) that iterator consumes the packets as they're iterated, so users should know that there'd be no way of going back if they skip to the end, and (b) since (as it happens) new items can't surprisingly jump into the receiver's queue at any moment, and instead can arrive only in circumstances that the user has to knowingly bring about. – JustinFisher Dec 01 '21 at 03:27
  • "new items can't surprisingly jump into the receiver's queue at any moment, and instead can arrive only in circumstances that the user has to knowingly bring about" - until someone decides that this queue really ought to be populated by another thread, or a separate async task, and then the problems are back. – user2357112 Dec 01 '21 at 03:32
  • Also, that `get_first_and_last_elements` is *designed* for iterators with no way back. It's not supposed to pre-peruse an iterable. It doesn't even *support* general iterables, only iterators, and for most multi-use iterables where you'd want the first and last elements, you'd just use `thing[0]` and `thing[-1]`. – user2357112 Dec 01 '21 at 03:35
  • @user2357112supportsMonica Thanks for correcting me about some of the queues not even being iterable at all. One of the primary use cases for receivers would involve iterating over their available packets, so I wouldn't want something like a `queue.Queue` that eschews iterability and forces you to manually use something like `q.get()` to pop items one at a time. Another candidate syntax could be `receiver.pop()` though I'm not convinced that it has any real advantages over `next(receiver)`. – JustinFisher Dec 01 '21 at 03:43
  • Unfortunately, the underlying queue-like data structures for my receivers are hidden in a C library, so I can't use something like `queue.Queue.queue` to get at them (as much fun as that was to type :-) ). So instead I'm stuck just iterating through the packets one at a time via its API, or manually rebuilding my own copy of the queue in Python, but that probably isn't worth doing. Since what the C-API gives me is basically just an iterator (with `next` and `isempty` methods), I think it makes sense to wrap it with an iterator in Python, as that'll make it very easy to use. – JustinFisher Dec 01 '21 at 04:16

2 Answers2

0

I'll add another not-fully-satisfactory answer, in case it helps anyone who's interested in exploring this. The requirement that an iterator never change its mind after issuing StopIteration dates back to the origin of iterators in Python in 2001, in PEP 234, which said:

Once a particular iterator object has raised StopIteration, will it also raise StopIteration on all subsequent next() calls? Some say that it would be useful to require this, others say that it is useful to leave this open to individual iterators. Note that this may require an additional state bit for some iterator implementations (e.g. function-wrapping iterators).

Resolution: once StopIteration is raised, calling it.next() continues to raise StopIteration.

Note: this was in fact not implemented in Python 2.2; there are many cases where an iterator's next() method can raise StopIteration on one call but not on the next. This has been remedied in Python 2.3.

The closest this comes to explaining the prohibition is saying "some say that it would be useful" (but also some don't). It also notes that "this may require an additional state bit for some iterator implementations (e.g. function-wrapping iterators)". But this seems to be more of a consideration against the prohibition, rather than for it, since obeying the prohibition may require that iterator implementations add an extra state bit to keep track of the fact that they're now officially retired? I guess one of the drawbacks of having a "benevolent dictator for life" is that his pronouncements often aren't all that well explained!

Also, it says that that this prohibition wasn't implemented back in Python 2.2, but that this was "remedied" in Python 2.3. Apparently it has been "unremedied" in the two decades (!!) since then! Or maybe what they "remedied" was just some particular iterators that hadn't obeyed this prohibition, not the fact that Python failed to enforce this prohibition?

I suspect that the goal here was just to tell people that they can generally expect iterators to keep saying that they've stopped once they've stopped, rather than risking causing errors or going into an infinite loop or or producing nonsensical return values if you accidentally try to next() them again. But it seems like the better pronouncement would have been to prohibit these sorts of "bad behavior", and not to prohibit well-motivated designs where an iterator intentionally produces more "good" values after having temporarily stopped. Unfortunately, this suspicion probably isn't enough to assure anyone that there isn't some hidden danger to violating this prohibition.

JustinFisher
  • 607
  • 1
  • 7
  • 10
  • 1
    "Or maybe what they "remedied" was just some particular iterators that hadn't obeyed this prohibition, not the fact that Python failed to enforce this prohibition?" - that is the case. This restriction has never been enforced. Similarly, if you write a `__hash__` method that just returns random numbers, that's a broken `__hash__` method, but Python won't throw an exception for you. Your object will just be completely unsafe to use with anything that expects working hash behavior. – user2357112 Dec 01 '21 at 03:18
  • For example, one of the iterators fixed was the default fallback iterator type, used for sequences that don't implement `__iter__`. [In Python 2.2](https://github.com/python/cpython/blob/v2.2/Objects/iterobject.c#L57), it might resume after raising StopIteration if extra elements were added to the underlying sequence. [In Python 2.3](https://github.com/python/cpython/blob/v2.3.1/Objects/iterobject.c#L46), the iterator now clears its reference to the underlying sequence when it hits the end, and always raises StopIteration if that reference has been cleared. – user2357112 Dec 01 '21 at 03:23
-1

Another (now-deleted) answer pointed out that Python's built-in file objects produce an iterator that can be, and often is, restarted after stopping, which is some evidence that stop-and-go iterators can be perfectly functional, just not the way that they officially say iterators "should" work.

Here's another simple example illustrating that an iterator can liven up again after raising StopIteration. But this doesn't explain why python docs discourage doing things this way, nor what the hidden dangers (if any) might be of doing so.

class StopAfterEachWord:
    def __init__(self, phrase = "stop and go"):
        self.phrase = phrase
        self.i = -1

    def __iter__(self): return self

    def __next__(self):
        self.i += 1
        if self.i >= len(self.phrase) or self.phrase[self.i]==' ': raise StopIteration
        return self.phrase[self.i]

it = StopAfterEachWord("stop and go")
for letter in it: print(letter)
print("The iterator has now stopped.")
for letter in it: print(letter)
print("The iterator stopped again.")
for letter in it: print(letter)

Try it online!

This example uses a single iterator called it. The latter two for loops illustrate that this iterator can continue working even after it has raised StopIteration to halt the earlier for loops.

JustinFisher
  • 607
  • 1
  • 7
  • 10