Not sure if it's still helpful after half a year, but probably worth being answered for other users who are wondering about the same question.
1. Who is responsible for copying these buffer data?
Both IOCP and io_uring work on the OS kernel side. In the case of io_uring
, the kernel spawns worker threads that execute the tasks and signal about completion via the completion queue (CQ), meaning that you not only avoid calling read()
and write()
yourself, but also these operations are done exactly in the kernel, which saves your currently running thread from unnecessary syscalls (the context switches between user/kernel modes are quite expensive).
You can check the following article to understand it a bit better: https://blog.cloudflare.com/missing-manuals-io_uring-worker-pool/
In addition, you can think of io_uring
as an effective mechanism of batch execution for syscalls. It allows calling many OS functions only with a price of the single syscall - io_uring_enter
.
The IOCP mechanisms are quite similar, although I wasn't able to find how exactly it utilizes the kernel threads to execute the tasks, but it is safe to assume that it uses at least one kernel thread to handle its driver IRPs (I/O request packets).
Answering your question, it's the kernel and its kernel-mode threads responsible for copying the buffer data.
2. Does read/write still block the current thread?
If you use the Overlapped I/O or io_uring
with non-blocking files/sockets, the calls submitted to the kernel don't block the current thread. You only need to block your thread when you're waiting (or polling) for the completion queue.
A little addition about epoll
and blocking reads or writes:
The reads or writes on the ready file descriptors are not really "blocking" your thread, e.g. if there is any available data on a socket, the read()
operation will just copy it from the kernel buffer to your own buffer and that's it. There is no real blocking except paying a price of a syscall. However, it's still possible to parallelize those operations using the thread pool. You can even parallelize reads for a single socket, but that requires an understanding of EPOLLONESHOT
and EPOLLEXCLUSIVE
flags to avoid race conditions and the "thundering herd" problem.
This is very well explained in this article: https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/