0

We are hosting an project on Google Cloud Engine. With an TCP-LoadBalancer before the cluster and the Nodes. Since an week we got the problem that customers are reporting the Error: "Connection Lost to the Server".

tpcdump on the IP of the LoadBalancer:

# tcpdump -v host X.X.X.X | grep admin
p5B3805D8.dip0.t-ipconnect.de > X.X.X.X.bc.googleusercontent.com:
ICMP host `p5B3805D8.dip0.t-ipconnect.de unreachable - admin prohibited filter, length 36`

Iptables from a default Node:

# iptables -nvL
Chain INPUT (policy ACCEPT 11 packets, 851 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 3 packets, 156 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 8 packets, 2130 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 284M  104G KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain DOCKER (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination

After that i checked the Firewall but all important Ports are allowed and nothing should be dropped. We get these messages on the hole cluster. Has anyone an recommendation what i need to check, to resolv that problem? Any help would be greatly appreciated.

A.Rempel
  • 1
  • 2
  • Does the error show that customers are not able to connect at all or just dropping the connection? i.e. If you telnet to the port does it show the connection is established? I do not believe your firewall rules are dynamically changing. Maybe looking at the [pod logs](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/) might shed additional light. – Carlos Mar 30 '17 at 20:57
  • Many thanks for your response. Customers will get the error from our software, which is managed in the Google project. The Telnet connection on port 80 of the web server was successful, and for port 443 i used `openssl s_client -connect domain: 443`, which was also successful. For me, it looks like that the loadbalancer drops the packages. The logs of the pods doesnt show anything relevant for that problem. I think the error is stuck in the loadbalancer. – A.Rempel Apr 03 '17 at 08:21
  • This is a difficult one because the problem is intermittent and only the HTTP LB offer [logs in alpha](https://cloud.google.com/compute/docs/load-balancing/http/). Now one of the things to keep an eye on is the [TCP keep alives](https://cloud.google.com/compute/docs/troubleshooting). Idle connections are disconnected after ten minutes. – Carlos Apr 03 '17 at 23:29
  • Additionally, If you are using one of these common applications in the backend, you might want to consider installing the [respective plugin](https://cloud.google.com/monitoring/agent/plugins/) as well as monitoring the cluster via [Stackdriver](https://cloudplatform.googleblog.com/2015/12/monitoring-Container-Engine-with-Google-Cloud-Monitoring.html) . – Carlos Apr 03 '17 at 23:30
  • Were you ever able to solve this issue? If so, please consider posting a self-answer so that the community can benefit. – Faizan May 31 '17 at 19:48
  • The problem could be solved by the help from the Google Support. By optimizing the docker image used, the error could be eliminated. The change from the network LoadBalancer to the HTTPS LoadBalancer had also helped. – A.Rempel Jun 29 '17 at 11:56

0 Answers0