Looking back at this question after 6 months, I feel I can clarify the doubts of my younger self. I hope this would be helpful to people who stumble upon it.
Yes, It is true that in multiprocessing module, each process has a separate GIL and there are no caveats to it. But the understanding of the runtime and GIL is flawed in the question which needs to be corrected.
I will clear the doubts/ answer the questions with a series of statements.
- Python code is ran (compiled to Cpython bytecode and then this bytecode interpreted) by CPython virtual machine. This is what constitutes the python runtime.
- When we create a new process, an entire new python virtual machine is launched (which we call the python process) with the stack and the heap memory.
- Yes this is a costly process but not too costly. Because python virtual machine is piece of C code precompiled to machine code. To put in perspective, the reason that in java they do not use multiprocessing is that it will create multiple JVMs which would be terrible as JVM needs a lot of memory and also, JVM is not precompiled machine code like CPython.
- GIL is just a piece of code within the python virtual machine which lets the CPython interpreter execute only one line of CPython bytecode (or one instruction) at a time. So, all questions related to GIL creation and cost are dumb. Basically the intention was to ask about CPython Virtual Machine.
- Can I relate 1 GIL to 1 CPU core? : Better to ask if 1 Python process can be related to 1 CPU core? : No. That's Kernel's job to decide what core the process is running (and which will keep changing from time to time and the process would have no control over it). The only thing is that at any give point of time, one python process cannot be running on multiple cores and one python process will execute only one instruction in CPython bytecode (due to the GIL).
What's copied in cores and how the OS tries to keep a process hold the Core it is working on is a separate ans very deep topic in itself.
The final question is a subjective one but with all this understanding, it's basically a cost to benefit ratio that may vary from program to program and might depend on how CPU intensive a process is and how many cores does the machine has etc. So that cannot be generalised.