I created a POC based on the blog post from cloudflare SOCKMAP - TCP splicing of the future.
In the meanwhile I refactored my redirection to bpf_sk_redirect_hash
I use iperf and sar to measure the performance/load on the system. What I notice, if iperf sends 4 GByte of data, also the memory consumption increases to 4 GByte of RAM. I enable kernel debugging with echo 1 > /sys/kernel/debug/tracing/events/kmem/kmalloc/enable; echo 1 > /sys/kernel/debug/tracing/events/skb/kfree_skb/enable; echo 1 > /sys/kernel/debug/tracing/events/skb/enable
and I saw following in trace_pipe:
A lot of:
kworker/1:1-62 [001] b.s41 1286.210300: bpf_trace_printk: sockmap: _prog_parser() ip: 167772674 port: 57914 len: 10220
kworker/1:1-62 [001] b.s41 1286.210300: bpf_trace_printk: sockhash: extract_socket_key() remote_ip4 167772674 remote_port 57914
kworker/1:1-62 [001] b.s41 1286.210301: bpf_trace_printk: sockhash: extract_socket_key() local ip4 167772687 local_port 20000
kworker/1:1-62 [001] b.s41 1286.210301: bpf_trace_printk: sockhash: bpf_prog_verdict() bpf_sk_redirect_map() -> 1
kworker/1:1-62 [001] b.s4. 1286.210303: kmalloc: call_site=pskb_expand_head+0x92/0x380 ptr=00000000f65f48cc bytes_req=1024 bytes_alloc=1024 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC node=-1 accounted=false
kworker/1:1-62 [001] ..s1. 1286.210350: consume_skb: skbaddr=0000000012add251 location=e1000_unmap_and_free_tx_resource+0x4b/0x70 [e1000]
sometimes:
kworker/1:1-62 [001] ..... 1286.210594: kfree_skb: skbaddr=000000006a44a004 protocol=2048 location=sk_psock_backlog+0x282/0x2f0 reason: NOT_SPECIFIED
kworker/1:1-62 [001] ..... 1286.210611: kfree_skb: skbaddr=00000000994c98aa protocol=2048 location=sk_psock_backlog+0x282/0x2f0 reason: NOT_SPECIFIED
When the socket are closed, I saw a lot of (more then 1000 lines):
kworker/2:1-74 [002] ..... 1286.332662: kfree_skb: skbaddr=000000000a070353 protocol=2048 location=sk_psock_destroy+0x8a/0x2c0 reason: NOT_SPECIFIED
kworker/2:1-74 [002] ..... 1286.332662: kfree_skb: skbaddr=00000000da28f450 protocol=2048 location=skb_release_data+0x137/0x1c0 reason: NOT_SPECIFIED
I try to figured out what is sk_psock_backlog
in the kernel sources/internet etc.
ChatGPT give me as answer:
As of my last update in September 2021, the sk_psock_backlog refers to the backlog queue for a Packet Socket (psock) in the Linux kernel. The sk_psock_backlog is used to store packets that are received by a packet socket but cannot be immediately delivered to the user space.
...
The purpose of the backlog queue is to ensure that packets are not dropped when they arrive faster than the application can process them. Instead, the kernel temporarily stores these packets in the backlog queue, waiting for the user space application to read them from the socket. This allows the application to catch up and process packets at its own pace.
If this is the case, how can I disable it ?
Because I use boost asio to to create the socket, I try to figured out if boost asio is enable something on the socket. I scanned the source code of boost asio but there isn't it any socket settings. I refactored my code and switch from sockmap to sockhash. But this didn't helped.
From ChatGPT I got the following hints:
You can set the socket option SO_RCVBUF using the setsockopt() system call to set the receive buffer size.
I tried with 0, but this didn't helped.
Use a raw socket (AF_PACKET) with a BPF program: Instead of using a standard packet socket with the backlog queue
I tried but I was not successful.
Use XDP (eXpress Data Path): XDP is another powerful eBPF-based technology in the Linux kernel that allows you to process packets at the earliest stage possible (before they enter the networking stack).
I this really the only option?
Regarding to the sockmap, I try to understand the test and example in the kernel sources. But all the example and test are somehow using still the socket in the user space and not waiting via poll the closing of the socket.
I'm stuck now and I have no clue if how I can fix "the memory consumption". Every hint or suggestion would be really helpful. Thanks