1

We are using non-swarm mode for our cluster of 3 machines with a bunch of links set up between containers and noticed a very strange problem: containers between 2 specific machines cannot communicate: containers on Machine A can talk to Machine B, but cannot reach the ones of Machine C. However, containers Machine B and Machine C are perfectly capable of connecting to each other.

After reading the logs we noticed that weave container on Machine A / Machine C (the ones with issues connecting) cannot reach the opposite machine.

The log is full of the messages like:

INFO: 2017/04/11 08:33:35.169670 ->[XXX.XXX.XXX.XXX:6783] attempting connection
INFO: 2017/04/11 08:33:35.187072 ->[XXX.XXX.XXX.XXX:6783] connection shutting down due to error during handshake: Unable to decrypt TCP msg
INFO: 2017/04/11 08:42:39.024325 ->[XXX.XXX.XXX.XXX:49040] connection accepted
INFO: 2017/04/11 08:42:39.035681 ->[XXX.XXX.XXX.XXX:49040] connection shutting down due to error during handshake: Unable to decrypt TCP msg

What is even more bizarre, no firewall rules are set up, machines are perfectly reachable and I can even telnet to the other weave daemon (using port 6783) and get "weave" string in response. We tried rebooting, redeploying the cluster and even recycling the machine with no luck: some bug or problem prevents weave on these specific machines to communicate.

We would be really thankful to you, SO users, if you would help or hint in any way.

Versions of software, just in case.

  • OS: Ubuntu 16.04.2 x64
  • Docker: 1.11.2-cs5, build d364ea1
  • weave: 1.6.2

EDIT: weave status X outputs (redacted):

  • weave status connections: Lists connections as established encrypted to hosts I can access containers on, and says failed Unable to decrypt TCP msg, retry: 2017-04-11 13:18:07.695016283 +0000 UTC for the problematic host

  • weave status peers: Lists only accessible hosts

  • weave status report: Just a JSON version with the same data

Anton
  • 2,483
  • 2
  • 23
  • 35
  • 1) Could you share the output of the below commands on the various nodes? - `weave status connections` - `weave status peers` - `weave report` 2) "Unable to decrypt TCP msg" suggests you are using Weave Net in encrypted mode (but messages couldn't be decrypted, see: https://github.com/weaveworks/mesh/blob/master/protocol_crypto.go#L200). Is this indeed the case? 3) Weave Net 1.6.2 is a relatively old version and many bug fixes and optimisations have been made since then. Would you be able to upgrade to the latest version (1.9.4) and confirm you still have this issue? – Marc Carré Apr 11 '17 at 11:43
  • 1) Not sure which container should I run this in? Please mind these are set up by Docker cloud, not me 2) Yes, that's what Docker Cloud defaults to 3) I know that, but again, as hosts are provisioned by Docker Cloud, we have no control over the versions I am afraid. Moreover, that's the first time we experience such issues with the current version of Docker / Weave. – Anton Apr 11 '17 at 12:43
  • Commands mentioned in 1) should be run on the host (not in a container) where the `weave` script is typically installed, but I am not sure you have access to this under Docker Cloud (I am not familiar with it, as you guessed :-)). – Marc Carré Apr 11 '17 at 12:57
  • Ok, so I managed to find weave and run the status. As the output is quite verbose, I have provided only relevant entries above – Anton Apr 11 '17 at 13:16
  • As discussed on Slack (https://weave-community.slack.com/archives/C2ND76PAA/p1491920641854457) the error message from the log and the output of the `weave` commands would indicate that `weave` has been started with different passwords on the various hosts. Please let us know if the suggested commands resolved it :-) – Marc Carré Apr 13 '17 at 10:54
  • Two issues here: passwords are managed by docker cloud, so I really have no control over those AND it's doesn't quite match up as A-B connects, A-C connects, B-C doesn't. If one node had a wrong password, it wouldn't be able to talk to any other node... – Anton Apr 14 '17 at 12:31
  • I would suggest reaching out to Docker Cloud to ensure things are set up properly on their side. – Marc Carré Apr 18 '17 at 10:26

0 Answers0