Very poor performance on Kubernetes with 100GbE network

Question

we are using ConnectX-5 100GbE ethernet cards on our servers which is connected one to each other trough the mellanox switch. And we are using weavenet cni plugin on our Kubernetes cluster. When we make some tests using iperf tool with the following command we get the 100Gbps connection speed in the host.

# server host
host1 $ iperf -s -P8
# client host
host2 $ iperf -c <host_ip> -P8
Result: 98.8 Gbps transfer speed

Also when we make some tests with the same tool and command using two docker containers on the same hosts we also get the same results.

# server host
host1$ docker run -it -p 5001:5001 ubuntu:latest-with-iperf iperf -s -P8 
# client host
host2 $ docker run -it -p 5001:5001 ubuntu:latest-with-iperf iperf -c <host_ip> -P8
Result: 98.8 Gbps transfer speed

But the when we create two diffrent deployment in the same hosts(host1,host2) with the same images and make the same test trough the service ip(we created a k8s service using the following yaml) which redirects traffic into the server pod we get the only 2Gbps. We also make the same test using the pod's cluster ip and the service's cluster domain but the results are same.

kubectl create deployment iperf-server --image=ubuntu:latest-with-iperf  # after that we add affinity(host1) and container port sections to the yaml
kubectl create deployment iperf-client --image=ubuntu:latest-with-iperf  # after that we add affinity(host2) and container port sections to the yaml

kind: Service
apiVersion: v1
metadata:
  name: iperf-server
  namespace: default
spec:
  ports:
    - name: iperf
      protocol: TCP
      port: 5001
      targetPort: 5001
  selector:
    name: iperf-server
  clusterIP: 10.104.10.230
  type: ClusterIP
  sessionAffinity: None

TLDR; The scenarios we tested:

host1(ubuntu 20.04, mellanox driver installed) <--------> host2(ubuntu 20.04, mellanox driver installed) = 98.8 Gbps
container1-on-host1 <--------> container2-on-host2 = 98.8 Gbps
Pod1-on-host1 <-------> Pod2-on-host2 (using cluster ip) = 2Gbps
Pod1-on-host1 <-------> Pod2-on-host2 (using service cluster ip) = 2Gbps
Pod1-on-host1 <-------> Pod2-on-host2 (using service cluster domain) = 2Gbps

We need to get the 100Gbps speed on pod-to-pod communication. So what could be causing this issue?

Update1:

When I check the htop inside pods during the iperf test there are 112 cpu core and none of them are struggling with CPU.
When I add the hostNetwork: true key to the deployments pods can reach up to 100Gbps bandwith.

If you do `htop` with 'detailed CPU statistics' on, can you see much CPU? I' thinking 'system' or 'softirq'. Another wild guess is that perhaps there is a stateful NAT layer in between? That can cause CPU issues, but is more of a problem with many connections, not packets. BTW: you can edit your question with the details, as opposed to replying by comment). — Halfgaar, Nov 24 '22 at 11:11
@Halfgaar thanks for your reply. I edited my question. We don't have any custom iptables rules on the hosts. The iptables is fully managed by the weavenet. — Zekeriya Akgül, Nov 24 '22 at 11:47
I don't know about K8S, but Docker for instance manipulates iptables, so you may have some inadvertently? But, you say you fixed it, with `hostNetwork: true`? — Halfgaar, Nov 24 '22 at 12:14

Zekeriya Akgül · Accepted Answer · 2022-11-27T13:38:37.160

2

We figure this out by disabling the encryption on weavenet. But rebooting the server did the trick. Thanks for this article.

edited Nov 27 '22 at 13:38

answered Nov 27 '22 at 13:27

Zekeriya Akgül

141
4

Very poor performance on Kubernetes with 100GbE network

1 Answers1