We are using non-swarm mode for our cluster of 3 machines with a bunch of links set up between containers and noticed a very strange problem: containers between 2 specific machines cannot communicate: containers on Machine A can talk to Machine B, but cannot reach the ones of Machine C. However, containers Machine B and Machine C are perfectly capable of connecting to each other.
After reading the logs we noticed that weave container on Machine A / Machine C (the ones with issues connecting) cannot reach the opposite machine.
The log is full of the messages like:
INFO: 2017/04/11 08:33:35.169670 ->[XXX.XXX.XXX.XXX:6783] attempting connection
INFO: 2017/04/11 08:33:35.187072 ->[XXX.XXX.XXX.XXX:6783] connection shutting down due to error during handshake: Unable to decrypt TCP msg
INFO: 2017/04/11 08:42:39.024325 ->[XXX.XXX.XXX.XXX:49040] connection accepted
INFO: 2017/04/11 08:42:39.035681 ->[XXX.XXX.XXX.XXX:49040] connection shutting down due to error during handshake: Unable to decrypt TCP msg
What is even more bizarre, no firewall rules are set up, machines are perfectly reachable and I can even telnet to the other weave daemon (using port 6783) and get "weave" string in response. We tried rebooting, redeploying the cluster and even recycling the machine with no luck: some bug or problem prevents weave on these specific machines to communicate.
We would be really thankful to you, SO users, if you would help or hint in any way.
Versions of software, just in case.
- OS: Ubuntu 16.04.2 x64
- Docker: 1.11.2-cs5, build d364ea1
- weave: 1.6.2
EDIT: weave status X outputs (redacted):
weave status connections: Lists connections as
established encrypted
to hosts I can access containers on, and saysfailed Unable to decrypt TCP msg, retry: 2017-04-11 13:18:07.695016283 +0000 UTC
for the problematic hostweave status peers: Lists only accessible hosts
weave status report: Just a JSON version with the same data