4

I'm new to consul and currently experimenting with it. I set up the cluster according to this article: https://www.digitalocean.com/community/tutorials/an-introduction-to-using-consul-a-service-discovery-system-on-ubuntu-14-04 all the agents are in "server" mode. However this cluster is rather unstable. When I enter "exec consul members" command on a server I can always see many other servers in "failed" status (sometimes they recover alive but soon fail again). I'm guessing maybe there are some tricks in writing the configuration files.

I use AWS EC2 instances to run those consul agents.

Thanks! Yorick

sheny35
  • 181
  • 3
  • 9

1 Answers1

5

Check consul agent logs on one of the "failed" instances. If there's a repeated entry like [WARN] memberlist: Refuting a suspect message or [WARN] memberlist: Refuting a dead message, it means that:

  • your agents are able to communicate with the consul servers, and register themselves;
  • your consul servers are not able to communicate back to agents

You need to make sure that your security groups allow traffic between the agents and the servers on all ports as described here: https://www.consul.io/docs/agent/options.html#ports.

For reference, here's what my security groups look like. You'll notice that consul agents are allowed to talk to consul servers, and consul servers are allowed to talk to consul agents PLUS between themselves on all UDP and TCP ports (which is excessive and I plan to restrict that to just the ports required by Consul).

consul agents consul servers

You also need to make sure that you're using internal EC2 IP addresses to communicate between your servers and clients. You don't want your gossip traffic to go out to the edge of EC2 zone and back, which is what will happen if you use the public IP addresses.

Hope this helps.

Vince Bowdren
  • 8,326
  • 3
  • 31
  • 56
ebr
  • 606
  • 8
  • 13