4

A common idiom in CPython to ensure thread safety for iteration is using tuple().

For example - tuple(dict.items()) is guaranteed to be threadsafe in CPython even if items are removed by a different thread.

This is because the interpreter does not run the eval loop and does not release the GIL while running these C functions. I've tested it, and it works great.

However, tuple(reversed(dict.items())) doesn't seem to be threadsafe, and I can't understand why. It's not running any Python function, it isn't explicitly releasing the GIL. Why am I still getting errors if I delete keys from the dict while it runs on a different thread?

Bharel
  • 23,672
  • 5
  • 40
  • 80
  • `reversed` working on `dict` and related dict views are relatively new, see [here](https://docs.python.org/3/whatsnew/3.8.html#other-language-changes) and [here](https://bugs.python.org/issue33462) Maybe it's an actual bug. – juanpa.arrivillaga Feb 02 '21 at 05:03
  • @juanpa.arrivillaga I'm not sure a bug is the correct term, as the GIL does not give you this kind of guarantee per-se, and you're technically supposed to use a lock. It is however unexpected, and I wonder why it works the way it does. – Bharel Feb 02 '21 at 05:52
  • 1
    Perhaps *oversight*? – juanpa.arrivillaga Feb 02 '21 at 05:53
  • Definitely, if ou need thread safety on such data structures you should use an explicit lock, even if in a given Python version you can get away with it. Remember this unpredictability in threaded concurrency is one of the best arguments in favor of modern single-threaded async. Even for the "tuple" constructor, this "thread safety" should be considered an implementation detail. – jsbueno Feb 03 '21 at 01:51
  • @jsbueno ofc it should count as an implementation detail, but then again why does the gil get released? – Bharel Feb 03 '21 at 03:49
  • I went to check the PR code that implements it, and I have no idea how it ended up being any different from direct iteration. :-) That is why I did not post an actual answer. – jsbueno Feb 03 '21 at 12:52

1 Answers1

4

Modifying the size of a dict while iterating over it will always give an error. The reason why tuple(d.items()) is thread-safe, is because both retrieval of the iterator and iteration happen in the same C function. d.items() creates a dict_items object but there's no iterator yet. That's why changes in dictionary size are still reflected:

>>> d = {'a': 1, 'b': 2}
>>> view = d.items()
>>> del d['a']
>>> list(view)
[('b', 2)]

However once the iterator has been retrieved the dict size must not change anymore:

>>> d = {'a': 1, 'b': 2}
>>> iterator = iter(d.items())
>>> del d['a']
>>> list(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

And this is what happens when using reversed: it creates a reverse iterator of the dict items. It's the iterator part that causes the trouble, because once the iterator has been created, the underlying dict must not change in size:

>>> d = {'a': 1, 'b': 2}
>>> r = reversed(d.items())
>>> r  # Note the iterator here.
<dict_reverseitemiterator object at 0x7fb3a4aa24f0>
>>> del d['a']
>>> list(r)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

So in the specific example tuple(reversed(dict.items())) an iterator reversed(dict.items()) of the original dict is created which is then iterated over by tuple. This iterator requires however that the dict does not change in size. It's like tuple(iter(dict.items())) just in reverse order.

Regarding the GIL switch, the eval loop runs when it's taking the result of reversed(), after creating the iterator, and sending it to tuple() for iteration. See the following dissasembly:

>>> dis.dis("tuple({}.items())")
  1           0 LOAD_NAME                0 (tuple)
              2 BUILD_MAP                0
              4 LOAD_METHOD              1 (items)
              6 CALL_METHOD              0
              8 CALL_FUNCTION            1
             10 RETURN_VALUE
>>> dis.dis("tuple(reversed({}.items()))")
  1           0 LOAD_NAME                0 (tuple)
              2 LOAD_NAME                1 (reversed)
              4 BUILD_MAP                0
              6 LOAD_METHOD              2 (items)
              8 CALL_METHOD              0
             10 CALL_FUNCTION            1
             12 CALL_FUNCTION            1
             14 RETURN_VALUE
Bharel
  • 23,672
  • 5
  • 40
  • 80
a_guest
  • 34,165
  • 12
  • 64
  • 118
  • 1
    Awesome, so the final point is that the eval loop does run, after creating the iterator. It happens when it's taking the result of `reversed()`, and sending it to `tuple()`. – Bharel Feb 09 '21 at 01:32