Is epoll a better API than io_uring?

Question

With io_uring you have to submit a new read request whenever the previous read request has completed. This is unnatural in a lot of cases because you usually just want to keep reading from a TCP connection. With epoll you just register a file-handle with the Kernel's epoll object once and then you get notified whenever new data is available to read. (What is "natural" is subjective, of course.)

There is, of course, the problem with epoll that you have to make repeated "read" syscalls to get to the actual data and in this regard io_uring is clearly better. So my statement mainly relates to the abstract semantics of the API. However, I could also see situations in which repeated read requests could pose a performance problem in io_uring, for example, for servers with a lot of connections (say, 20k) that all do a lot of very short reads (say, 4 bytes).

Am I missing something here? Can io_uring be used in a mode where a single submission-queue-entry (sqe) can result in multiple completion-queue-entries (cqe)?

It seems to me that your mileage will vary on the application domain. If you want to know gains for specific application, just profile it. It also seems to me that the kernel/io_uring devs would have good insights on the applicability spectrum. In principle I don't see a big difference between the two - as you pointed out - except the obvious reduction of buffer copying which is the point of uring buffers AFAICT. — sehe, Jul 02 '23 at 11:33

Jacob Shtokolov · Answer 1 · 2023-07-04T15:46:43.367

Basic Explanation

In terms of the API, the proactor pattern (io_uring, IOCP, ioring) is superior to the reactor (epoll, kqueue, etc.) because it actually mimics the natural program control flow: you "call" some asynchronous function (by scheduling it for execution) and then wait for the result by reading the completion queue, or by waiting on the "completion port".

In the blocking mode, the typical code looks like this (pseudocode):

char[255] buf;
int ret = recv(socket, &buf, sizeof(buf), 0);
// Now we have the buffer and the number of bytes read
// ...

The non-blocking mode in the proactor pattern is similar, it's just we can issue multiple syscalls at once (pseudocode again):

char[255] buf1;
char[255] buf2;
char[255] buf3;

int ret1, ret2, ret3 = wait(
    recv(socket1, &buf1, sizeof(buf1), 0),
    recv(socket2, &buf2, sizeof(buf2), 0),
    recv(socket3, &buf3, sizeof(buf3), 0)
);
// Now we have all the buffers and return values
// ...

This model not only reduces the mental burden on the programmer but also unlocks the possibility to share the workload between multiple CPU cores under the hood by utilizing kernel threads. Such scaling is especially beneficial to the file IO because there is no default way of making truly asynchronous read or write calls without blocking a thread.

The previous Linux attempts to do the async file IO like the POSIX AIO were very limited and rather ugly, so the io_uring is an evolutionary step forward in the right direction.

However, the proactor pattern obviously has some downsides such as the need to keep the buffers in RAM for each ongoing read/recv call. This is negligible at first, but once you have to handle many connections, you'll need a lot of memory that is not actively utilized and just waiting for completion.

io_uring tries to partially solve this problem by offering the buffer pooling facilities, but that's still nowhere close to what you can do with a single-threaded epoll event loop.

Repeated scheduling problem

As for your problem of repeated scheduling, the io_uring actually offers the "multishot" mode for some of its calls:

AFAIK, timeouts also support this mode, which in fact turns them into timers. But the main problem is that io_uring is still under development, so some of those features are available only in the newest Linux kernels (6.0+).

Summary

So the answer is: io_uring is the better API which comes with a price, but handles the multi-threading, file IO, and other things just out of the box. epoll, on the other hand, provides more granular control over buffering and function calls, but once you need to deal with files (or multiple threads), you're on your own.

epoll can still be relevant for low-memory devices, but on modern systems, it'd be more beneficial to plan for io_uring support, because it's probably going to replace select, poll, and epoll in the future.

However, since io_uring is still under development, it's a constant source of dangerous vulnerabilities, so some companies like Google are putting it on hold. This fact is also worth considering when choosing between the two.

Is epoll a better API than io_uring?

1 Answers1

Basic Explanation

Repeated scheduling problem

Summary