How does io_uring handle short recvs when more data arrives on the socket?

Question

Trying to find to retrieve data from io_uring efficiently. Short recv/send and the fragile SQE links have me blasting requests at the kernel to with most being cancelled.

If I request a recv of 8MB (basically the size of my user space buffer) and a 50 byte packet arrives (assume no LWM), a short recv will be posted to the completion queue with data in the buffer.

And since it is not a complete read, it will cause any linked operations to cancel.

If more data arrives on the socket, it kernel can't post more can it? When trying to reduce latency to a minimum, do I need to have multiple outstanding, non-linked requests that will easily all have short reads while sitting in the completion queue.

It there anything that describes this flow in detail?

I'm having a difficult time figuring out what others are doing. To it me seems like they are definitely leaving data in the kernel buffers, but I might be misunderstanding the policy on when short reads are allowed for network io. Almost all network io reads would be short reads if that was the case though, and that would be a little crazy, so I must be missing something. — JasonN, Sep 09 '22 at 10:31

Yuri Myakotin · Answer 1 · 2022-09-10T09:22:44.833

0

And since it is not a complete read

What read opcode you using? IORING_OP_READ_FIXED, IORING_OP_READV, IORING_OP_READ or IORING_OP_RECV? Only last one have flag for "wait for all data" - first 3 ones will do success completion after receiving any amount of data - from 1 byte to your buffer size.

About linked requests - what you linking with read? Writing to another socket? For that - better use reverse way: write (already received data, non-linked read first time) first, then linked read for new data.

edited Sep 10 '22 at 09:22

answered Sep 10 '22 at 09:13

Yuri Myakotin

61
1
3

READED_FIXED. A better way to ask this might be to ask when the cqe is filled in? If it is filled in when the first packet arrives on a larger request, what happens when more data arrives for the same request? Does it update the length of the read request response? If you issue a 1M read, what exactly happens when each packet arrives? (This is all for TCP) – JasonN Sep 11 '22 at 23:32
Its works exactly same as normal synchronous "read" - completes when any amount of data arrives. If you issue 1m read, you can get anything from 1 packet to 1m (if any data been in kernel buffers already before you call recv). For example, when other side connected by telnet - you receive 1 byte each time when other side typing something. – Yuri Myakotin Sep 12 '22 at 05:59
Assume nothing in the buffer already. If it completes on first data, shouldnt most (all?) tcp reads be < 1500 bytes? since that first packet will cause the read to complete with data? When does data from kernel buffer to the shared buffer? If always there, something has to update the completion message. If it gets transfered as the result of a call (eg, peek) then what calls cause this? Maybe it relies on the NIC packet coalescing the interrupts? Polling the card wouldn't be very efficient then for io_uring. – JasonN Sep 12 '22 at 08:21
I think it's depends on kernel/nic drivers specific implementation. Anyway, TCP is about stream, not messages. If you need read specific amount of data - you have to sacrifice fixed buffer advantages and use IORING_OP_RECV with MSG_WAITALL flag. Or just implement comparing bytes_already_received with bytes_need_to_be_received (and adding already-received-bytes as data pointer offset) on each IORING_OP_READFIXED completion. Or (best way, if can) process data in small parts, not in large blocks. – Yuri Myakotin Sep 12 '22 at 11:18
WAIT_ALL doesn;t work either (and wouldn't sollve my problem. This is a pretty common problem too. – JasonN Sep 16 '22 at 17:13
Again - what you trying to do? Receiving NNN bytes, then what's next? what you trying to link after receive? – Yuri Myakotin Sep 16 '22 at 18:38
The goal is to grab everything the kernel has right now. The connection is somewhat bursty at it peaks and also has substantial traffic. Latency is the issue, not throughput. Under epoll that wasn't an issue. As long as I sized by user buffer larger enough, I read MAX_BUF every time and I'm good. Trying to improve on that with io_uring, submit queue polling, and fixed buffers. There are no real message boundaries for this, I just want it all. The goal is to reduce syscalls or course, doing a read() call after doesn't help (turns this into an expensive epoll. – JasonN Sep 18 '22 at 00:10

How does io_uring handle short recvs when more data arrives on the socket?

1 Answers1