Performance issues with weave networking on Kubernetes cluster

Question

I create a Kubernetes (v1.6.1) cluster on AWS with one master and two slave nodes, then I spin up mysql instance using helm and deploy a simple Django web-app that queries latest five rows from the database and displays it. For my web service I specify 'type: LoadBalancer' which creates an ELB on AWS.

If I use 'weave' networking and scale my web-app to at least two replicas, then I begin experiencing inconsistent response time - most of the time it is reasonable (like 0.1-0.2 s), but 20-40% requests take significantly longer (3-5 s, sometimes even more than 15 s). However, if I switch to 'flannel' networking, everything works fast, even with 20-30 replicas of the web-app. All machines have enough resources, so that's not the problem.

I tried debugging to find out what's causing the delay, and the best explanation I have is that AWS ELB doesn't work well with 'weave'. Has anyone experienced similar issues? What could be the problem? Please let me know if I should provide some relevant information.

P.S. I'm new to using Kubernetes.

A probable cause: pods on different EC2 nodes are not able to communicate with each other. Try to confirm if this is an issue by trying to ping/curl from one container to other other container in other EC2 instance. — user3098466, Jun 27 '17 at 09:21
I tried ping'ing mysql container from both slave nodes and from inside of Docker containers on both nodes. Results: - node where mysql container is at: rtt min/avg/max/mdev = 0.047/0.061/0.111/0.026 ms out of container on the same node: round-trip min/avg/max/stddev = 0.055/0.086/0.147/0.032 ms - from the other slave node: rtt min/avg/max/mdev = 1.199/1.537/2.471/0.471 ms out of container: round-trip min/avg/max/stddev = 1.250/1.359/1.429/0.059 ms What puzzles is me that I got pretty consistent response times unlike what I get when curl'ing through ELB endpoint. — yuzefovich, Jun 27 '17 at 17:46
ping times look reasonable. Have you checked what configuration is weave running in? Weave supports two protocols: sleeve and fast datapath. Fast data path is faster and default. However weave might start using sleeve if fast data path does not work for some reason. Check if weave is indeed using fast data path protocol. You will need to open some weave specific ports in your worker node security group. — user3098466, Jun 28 '17 at 06:20
@yuzefovich, the following benchmarks have been done on AWS: https://www.weave.works/blog/weave-net-performance-fast-datapath/ and steps to reproduce them are here: https://github.com/weaveworks/weave/blob/5b060634427f4e65ad1873ef9871f94f04b4486a/docs/benchmarks.md#2017-02-16-fast-datapath-encryption It is probably worth trying something similar, and see how that goes. If you still see poor performance, next thing would be too run tcpdump to investigate what is happening. — Marc Carré, Jun 30 '17 at 11:38

Performance issues with weave networking on Kubernetes cluster

0 Answers0