0

Team,

I have Mellanox Nic ConnectX-4 on a k8s worker node and it hosts a nodeLocal dns pod on it. The nodeLocalDns pod is timing out when trying to connect to coreDns service on k8s cluster.

Same works on Ubuntu18.

Versions failing with

k8s v1.13.5 Baremetal
Ubuntu 20.04.4 LTS   
kernel 5.4.0-100-generic   
docker://19.3.13

below works well.

 k8s v1.13.5 Baremetal
 Ubuntu 18.04.2 LTS   
 kernel  4.15.0-45-generic   
 docker://18.9.2

Any hint how can I debug this? I am getting no clue in logs.

Errors are from nodeLocalDNS pod logs.

A: dial tcp 100.60.3.4:53: i/o timeout

Where above is coreDns service and it is pingable from nodeLocalDns pod but not connecting on dns port.

AhmFM
  • 119
  • 5

1 Answers1

0

It was a interoperability issue that we fixed disabling checksum off on the NIC of the node. after below command, pod networking started working. this was only with mellanix ConnectX-4. same was not observed with ConnectX-5

ethtool -K ens1 rx on tx off
AhmFM
  • 119
  • 5