DNS in K8S pod: tcpdump shows bad udp checksum but nslookup still works and UDP error counters don't increment

Question

I have a K8S pod. Inside pod, I do dns lookup using nslookup. It works fine. But when I do tcpdump on pod interface (eth0), it clearly shows received dns response has bad udp checksum. I checked with netstat the udp counters, but I dont see the checksum error counter (InCsumErrors) at all getting hit. Here are some relevant outputs.

IP config of pod:

root@node:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if10936: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether e2:22:5c:6c:53:bd brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.233.85.177/32 scope global eth0
       valid_lft forever preferred_lft forever

Successfull Nslookup:

bash-4.4# nslookup google.com    
Server:     169.254.25.10
Address:    169.254.25.10#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.207.238
Name:   google.com
Address: 2a00:1450:400e:809::200e

Tcpdump showing bad udp cksum for above nslookup run:

root@node:~# tcpdump -ni eth0 -vvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:24.267999 IP (tos 0x0, ttl 64, id 50356, offset 0, flags [none], proto UDP (17), length 82)
    10.233.85.177.52764 > 169.254.25.10.53: [bad udp cksum 0x23f2 -> 0xd1bd!] 43806+ A? google.com.qaammuk.svc.cluster.local. (54)
16:02:24.269489 IP (tos 0x0, ttl 64, id 56987, offset 0, flags [DF], proto UDP (17), length 175)
    169.254.25.10.53 > 10.233.85.177.52764: [bad udp cksum 0x244f -> 0x2c2a!] 43806 NXDomain*- q: A? google.com.qaammuk.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (147)
16:02:24.269847 IP (tos 0x0, ttl 64, id 50357, offset 0, flags [none], proto UDP (17), length 74)
    10.233.85.177.39433 > 169.254.25.10.53: [bad udp cksum 0x23ea -> 0xac65!] 45029+ A? google.com.svc.cluster.local. (46)
16:02:24.270901 IP (tos 0x0, ttl 64, id 56988, offset 0, flags [DF], proto UDP (17), length 167)
    169.254.25.10.53 > 10.233.85.177.39433: [bad udp cksum 0x2447 -> 0x06d2!] 45029 NXDomain*- q: A? google.com.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (139)
16:02:24.271206 IP (tos 0x0, ttl 64, id 50358, offset 0, flags [none], proto UDP (17), length 70)
    10.233.85.177.59330 > 169.254.25.10.53: [bad udp cksum 0x23e6 -> 0xdaca!] 2633+ A? google.com.cluster.local. (42)
16:02:24.272262 IP (tos 0x0, ttl 64, id 56989, offset 0, flags [DF], proto UDP (17), length 163)
    169.254.25.10.53 > 10.233.85.177.59330: [bad udp cksum 0x2443 -> 0x3537!] 2633 NXDomain*- q: A? google.com.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (135)
16:02:24.272527 IP (tos 0x0, ttl 64, id 50359, offset 0, flags [none], proto UDP (17), length 56)
    10.233.85.177.53873 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x278c!] 52759+ A? google.com. (28)
16:02:24.272707 IP (tos 0x0, ttl 64, id 56990, offset 0, flags [DF], proto UDP (17), length 82)
    169.254.25.10.53 > 10.233.85.177.53873: [bad udp cksum 0x23f2 -> 0xe468!] 52759* q: A? google.com. 1/0/0 google.com. [8s] A 216.58.211.110 (54)
16:02:24.272963 IP (tos 0x0, ttl 64, id 50360, offset 0, flags [none], proto UDP (17), length 56)
    10.233.85.177.54691 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x370f!] 47943+ AAAA? google.com. (28)
16:02:24.273141 IP (tos 0x0, ttl 64, id 56991, offset 0, flags [DF], proto UDP (17), length 94)
    169.254.25.10.53 > 10.233.85.177.54691: [bad udp cksum 0x23fe -> 0xf8e0!] 47943* q: AAAA? google.com. 1/0/0 google.com. [8s] AAAA 2a00:1450:400e:809::200e (66)

netstat output to show udp counters from linux stack. No InCsumErrors:

root@node:~# netstat -s -u
Udp:
    18 packets received
    0 packets to unknown port received
    0 packet receive errors
    18 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
IpExt:
    InOctets: 2130
    OutOctets: 1101
    InNoECTPkts: 18

I tried both with checksum offload enabled and disable on eth0. Same behavior in both cases.

Shouldn't bad udp checksum detected by tcpdump mean that kernel will at some point drop udp packets before handing them over to the socket bound to nslookup?

score 0 · Answer 1 · edited Oct 07 '21 at 13:31

When you do nslookup google.com 8.8.8.8 everything looks fine. I think that this because since you are using coredns to resolve the domains, the packets run through a Service.

Service in k8s is a virtual entity. It appears as a forwarding rule in iptables. During forwarding process, source ip address is swapped out without recalculating the cksum, thus the error in tcpdump.

And according to RFC 768, UDP checksum is defined as following:

Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data

So as you see, IP header also is a part of a checksum and so is the source IP that is swapped out and this is changing the checksum.

Calculating the checksum is usually done using hardware acceleration of NIC before sending/receiving packets to a node. It would require a lot of computation just to compute checksums of all packets doing through iptables. And also it is useless because once you receive a packet on a node's network interface and you confirm that it's valid, you can be sure that it will stay valid within the node even after you forward it with iptables.

Does K8S setup some rules for linux kernel to ignore bad udp checksums for pod interfaces?

I know that e.g. loopback interface does not checksum packets (at least not by default). Maybe brigde interfaces (e.g. docker0 and veth*) also doesn't checksum. I tried to find a strong evidence to prove this statement but I didn't find anything to either prove it or disprove it.

Thanks. If I didnt misunderstand, then that explains why the udp checksum would be wrong on packets when they enter the pod but not why linux kernel in the pod doesnt discard them and reports Checksum Errors for the received udp packets. Does K8S setup some rules for linux kernel to ignore bad udp checksums for pod interfaces? — quandrione, Jan 08 '21 at 15:39
@quandrione I am not sure, but I know that e.g. loopback interface does not checksum packets (at least not by default). Maybe brigde interfaces (e.g. docker0 and veth*) also don't checksum. I tried to find a strong evidence to prove this statement but I didn't find anything to either prove it or disprove it. If you find something, please share it with me, I would love to know. — Matt, Jan 12 '21 at 10:28
And also, don't be affraid to leave a like/accept if you find my answer somewhat helpful. Thanks :D — Matt, Jan 12 '21 at 10:30

score 0 · Answer 2 · edited May 26 '22 at 20:32

0

try:

ethtool --offload eth0 rx off tx off
ethtool -K eth0 gso off

edited May 26 '22 at 20:32

Jeff Schaller

2,352
5
23
38

answered May 25 '22 at 06:24

robotiaga

315
2
11

DNS in K8S pod: tcpdump shows bad udp checksum but nslookup still works and UDP error counters don't increment

2 Answers2