From Puma's README:
On MRI, there is a Global VM Lock (GVL) that ensures only one thread can run Ruby code at a time. But if you're doing a lot of blocking IO (such as HTTP calls to external APIs like Twitter), Puma still improves MRI's throughput by allowing IO waiting to be done in parallel.
Unfortunately, it does not explain the mechanism of improving MRI's throughput.
I know that MRI would release the GIL when calling system IO, but it is the improvement by MIR instead of Puma.
I wonder how Puma improves blocking IO in parallel.
Any reference would be appreciated.