5

When the Mesos scheduler (or slave) is on a different machine than the Mesos master, it keeps trying to connect to the master but gets disconnected. This cycle repeats continuously. How to fix this problem?

vinodkone
  • 2,731
  • 4
  • 22
  • 21

2 Answers2

5

Both the framework (and slaves) and master need to be able to talk to each other. IOW, if one of the end points uses a private IP (e.g., 127.0.0.1) then it wouldn't work. If you want the master/slave to use a public ip you can use --ip flag. For the framework, you can set LIBPROCESS_IP in the environment.

vinodkone
  • 2,731
  • 4
  • 22
  • 21
  • If you bind to the 'public' IP, there is an assumption that IP is on a private network, right? Otherwise couldn't any slave connect to the master, or vice versa? – Peter Becich Aug 18 '14 at 22:56
  • 1
    That is correct. It is only required that the IPs of master and slave are visible to each other. It doesn't have to be public to the rest of the world. – vinodkone Aug 19 '14 at 03:43
  • What do you mean by using the `--ip` flag? Where is it used? – Chetan Bhasin Jul 10 '15 at 12:44
  • Where do you set LIBPROCESS_IP in the env ? Is it in the marathon json or is it on the marathon host ? Does marathon has to be started as bridge or host ? – Dimitri Kopriwa Nov 11 '15 at 11:23
2

we need a bit more information to go on - it sounds like you aren't advertising the slave on an IP the master can get to.

As mentioned above, a slave will happily advertise it's IP address as 127.0.0.1/localhost which obviously isn't reachable from the master unless they're on the same server. This should show up in the master and slave logs, so check those.

firewalls can also be an issue, so try after disabling those to rule them out.

Rasputnik
  • 313
  • 2
  • 4