9

The asyncio docs read:

Most asyncio objects are not thread safe. You should only worry if you access objects outside the event loop.

Could someone explain this or give an example of how misuse of asyncio can cause an unsynchronized write to an object shared between threads? I thought the GIL meant that only one thread can run the interpreter at a time and so events that happen in the interpreter, like reading and writing Python objects, are trivially synchronized between threads.

The second sentence in the quote above sounds like a clue but I'm not sure what to make of it.

I guess a thread could always cause havoc by releasing the GIL and deciding to write to Python objects anyway but that isn't specific to asyncio so I don't think that's what the docs are referring to here.

Is this maybe a matter of the asyncio PEPs reserving the option for certain asyncio objects to not be thread safe even though at the moment the implementation in CPython just so happens to be thread safe?

Praxeolitic
  • 22,455
  • 16
  • 75
  • 126
  • 2
    Some operations require multiple instructions to be synchronized, in which between Python can be interpreted by a different thread. The GIL never *trivially synchronizes* a Python program, nothing to do with asyncio. -- It just makes sure Python objects are thread safe on the C-level, not on the Python level. – Niklas R Jan 04 '17 at 08:49

1 Answers1

8

Actually, no, each thread is exactly that, a new thread of the interpreter.

It is a real thread managed by OS, not internally managed thread just for Python code within the Python Virtual Machine.

GIL is needed to prevent very OS-based threading from messing up Python objects.

Imagine one thread on one CPU and another on the other. Pure parallel threads, written in assembly. Both at the same time trying to change a registry value. Not desirable circumstance at all. Assembly instructions to access the same memory position will end up scrambling on what to move where and when. In the end the result of such an action may easily lead to segmentation fault. Well, if we write in C, C controls that part, so that this doesn't happen in C code. GIL does the same for Python code on C level. So that code implementing Python objects doesn't loose its atomicity when changing them. Imagine a thread inserting a value to a list that is being just shifted down in another thread because that other thread removed some elements from it. Without a GIL this would crash.

GIL does nothing about atomicity of your code within the threads. It is just for internal memory management.

Even if you have thread safe objects like deque(), if you are doing more than one operation at once on it, without additional lock, you can get result from another thread inserted somewhere in between. And whoops, problem occurs!

Let say one thread takes an object from a stack, checks something about it, and if condition is right removes it.

stack = [2,3,4,5,6,7,8]
def thread1 ():
    while 1:
        v = stack[0]
        sleep(0.001)
        if v%2==0: del stack[0]
        sleep(0.001)

Of course, this is stupid and should be done with stack.pop(0) to avoid this. But this is an example.

And let have another thread that adds to the stack each 0.002 seconds:

def thread2 ():
    while 1:
        stack.insert(0, stack[-1]+1)
        sleep(0.002)

Now if you do:

thread(thread2,())
sleep(1)
thread(thread1,())

There will be a moment, although unlikely, where thread2() tries to stack up new item exactly in between thread1()'s retrieval and deletion. So, thread1() will remove a newly added item instead of the one being checked. The result doesn't comply with our wishes. So, GIL doesn't control what we are doing in our threads, just what threads are doing to each-other on more basic sense.

Imagine you wrote a server for buying tickets for some event. Two users connect and try to buy the same ticket at the same time. If you are not careful, users may end sitting one on top of other.

Thread-safe object is an object that performs the action and it doesn't allow another action to take place until the first one is completed.

For instance, if you are iterating over deque() in one thread, and in middle of it another thread tries to append something, append() will block until the first thread is done iterating over it. This is thread-safe.

Dalen
  • 4,128
  • 1
  • 17
  • 35
  • `GIL doesn't control what we are doing in our threads, just what threads are doing to each-other` - this line is pure gold. – xyres Nov 15 '17 at 11:48
  • `Both at the same time trying to change a registry value. ... Well, if we write in C, C controls that part, so that this doesn't happen in C code.` If you mean "register value", OS controls it, not C. Or CPU, if threads are on different cores. C language and most of it's std library don't care about threads, letting user to shoot his foot in any way he wants. – al.zatv Jan 13 '23 at 13:28
  • @al.zatv : Yes registry term may be perceived wrongly. What I meant about C is that it creates a function stack, memory placeholder for local variable names within the function that are later translated to memory addresses and other stuff you would have to implement manually if you are programming in pure assembly. Stuff that C translates to when compiling. And no, OS has no time to analyze bytecode before or mid execution to determine whether your code is messing up allocated memory shared between threads or processes for that matter. So, no OS does nothing. – Dalen Jan 15 '23 at 09:58
  • @al.zatv : What I referred to is that C is our memory manager. C has no GIL, nor its emulation while compiling, so yes it would let you put your foot wherever you want. C stdlib is, yes, mostly not thread safe. What OS does control though are system calls. And stdlib is full of them. Like printf(), scanf(), open(), sleep(), alloc(), free() etc. etc. E.g. if you use printf() within threads you might get lines printed jumbled on the screen according to OS's priorities while executing your threads, but you wouldn't get letters mixed together, nor two lines convolved because of memory overlap. – Dalen Jan 15 '23 at 11:13
  • If you are crazyyou may inject assembly code that writes directly to graphic card registers and execute your process in kernel space and, then, you may achieve mentioned scenarios or freeze your OS entirely. OS also mostly detects memory leaks and some other little but important things, that were usually set by system calls. If you get out of bounds for the OS allocated memory, for instance, you will receive the famous segmentation fault, and, hopefully, memory dump from the OS .As I said, OS does nothing to ensure thread safety, otherwise Python wouldn't need the GIL in the first place. – Dalen Jan 15 '23 at 11:29
  • "> you may inject assembly code that writes directly to graphic card registers and execute your process in kernel space" -- modern OSes usually don't allow you to do that, even in asm. You have to make kernel module or driver for that. (as for graphic card registers, usually they can be manipulated with IN/OUT commands from CPU, not 100% sure they are open in Linux/Windows userspace) – al.zatv Feb 15 '23 at 14:07
  • @al.zatv : Yap. The point is, you can do it. I don't think that if you try to write to E10 and similar registers in ASM while being a module/driver OS has no choice but to let you do it. You are essentially part of OS then so it wouldn't even notice. I never tried it with OS. Just wrote my own little OS for x86 in C & ASM. Nasty-nasty work. As for userspace, bare in mind that libraries such as SDL, OpenGL, DirectX ... do have a big amount of low-level control over graphics. So you perhaps can mess it from userspace too. – Dalen Feb 15 '23 at 20:20
  • @Dalen opengl and directx have two parts: user-space libraries and drivers. They communicate via syscalls (or like that). So all dirty work is done by drivers. SDL is one more layer on top of OpenGL/DirectX, AFAIK. – al.zatv Mar 01 '23 at 16:54
  • @al.zatv : That wasn't the point of what I said. BTW, You can distribute SDL as bunch of shared libraries with your app, thus, no drivers of the SDL itself are installed. Certainly these libs, especially DirectX use the native graphics system of the OS, which does run in kernel space, but again, that wasn't my point. Let us not stray from the context of the question too much, please. – Dalen Mar 02 '23 at 12:40