2

I have installed and configured Mesos and Marathon. Whenever I try to schedule an application, it remains in 'Waiting' state which seems to indicate that Marathon is waiting for offers from Mesos.

When I check the logs in Mesos, I see the following:

I0425 20:22:10.313910  4279 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d@127.0.1.1:50892
I0425 20:22:10.313987  4279 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [  ]
I0425 20:22:10.313994  4279 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d@127.0.1.1:50892 already subscribed, resending acknowledgement
W0425 20:22:10.314007  4279 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d@127.0.1.1:50892
E0425 20:22:10.314193  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:11.226884  4284 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
I0425 20:22:11.226959  4284 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:11.226969  4284 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:11.226982  4284 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
E0425 20:22:11.227226  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:12.113598  4281 http.cpp:312] HTTP GET for /master/state from 192.0.2.1:49698 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
I0425 20:22:12.314221  4286 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d@127.0.1.1:50892
I0425 20:22:12.314304  4286 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [  ]
I0425 20:22:12.314312  4286 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d@127.0.1.1:50892 already subscribed, resending acknowledgement
W0425 20:22:12.314337  4286 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at scheduler-07d9654e-5c40-4172-a25d-97c565b5765d@127.0.1.1:50892
E0425 20:22:12.314524  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:13.081887  4284 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
I0425 20:22:13.081964  4284 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:13.081987  4284 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:13.082005  4284 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
E0425 20:22:13.082314  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:13.221590  4282 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
I0425 20:22:13.221664  4282 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:13.221674  4282 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:13.221688  4282 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
E0425 20:22:13.222162  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:14.412215  4286 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
I0425 20:22:14.412281  4286 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:14.412289  4286 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928 already subscribed, resending acknowledgement
W0425 20:22:14.412302  4286 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at scheduler-d998dfb4-9cc2-4d22-9cb7-416433c2fb57@127.0.1.1:35928
E0425 20:22:14.412495  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected

Any idea as to why it mentions a 'disconnected' framework. In Mesos, I can see the 3 slaves and the Marathon (and Chronos) framework are mentioned in the 'active frameworks'.

The /etc/hosts mention the following entries:

192.0.2.11  master1  # VAGRANT: cd38e81ab8742b23dfbcb913468368ea (master1) / 1b611425-dbad-4bd0-8727-4169c09ec045
192.0.2.51  slave1  # VAGRANT: 94630539b67d178dddffda29a0313a75 (slave1) / 1a1694de-2bd2-4d96-bdf2-dd6767d1f310
192.0.2.52  slave2  # VAGRANT: 306e67b33b327b3d1c9990bf1316a321 (slave2) / bdbd677e-5298-4d49-90a8-e521139dd127
192.0.2.12  master2  # VAGRANT: fb338e9e9c001a5bfab605387ba88d02 (master2) / bdccfd80-b1e6-48a0-8986-b24c7cbd7a25
192.0.2.53  slave3  # VAGRANT: 3913b3358eadc90c622859ddb90bfede (slave3) / 786cbe69-2af5-43b7-8e70-d6cc07d4ddf4
192.0.2.13  master3  # VAGRANT: 92cdd6e36a6c0391e2a66f73661e56fe (master3) / 03bb2c16-f474-4412-b8f4-fce82e12955c

Note: in case more info is needed on how the cluster was installed, please refer to this

wiwa1978
  • 2,317
  • 3
  • 31
  • 67

2 Answers2

2

You can also set LIBPROCESS_IP as environment variable. I think this is better than changing the /etc/hosts.

Found the solution here: https://groups.google.com/forum/#!topic/marathon-framework/1qboeZTOLU4

TooAngel
  • 873
  • 6
  • 13
  • You can set this environment variable in a config file used by the service. For example in Ubuntu, in /etc/default/marathon – Bertrand88 Aug 19 '16 at 11:46
1

I guess you need to make sure that the hostnames are resolvable to actual IP addresses.

That's at least what fixed my problems when Marathon etc. tried to bind to 127.0.1.1 on Ubuntu. I.e. you should add on each host the IP to hostname mappings, e.g.

192.0.2.11 master1

entry in the /etc/hosts file either before the mapping of the 127.0.1.1 to the hostname, or remove the 127.0.1.1 entry entirely. The Vagrant plugin vagrant-hostsupdater might help.

Tobi
  • 31,405
  • 8
  • 58
  • 90
  • Thanks for your suggestion. I tried indeed the vagrant-hostsupdater but it doesn't solve it. Situation remains that the application in Marathon remains in 'waiting' state. – wiwa1978 Apr 26 '16 at 09:56
  • Have you checked the contents of the `/etc/hosts` on each host and verified that they contain an actual private ip? If so, have you restarted the mesos master/slave/marathon services? Can you show the contents? – Tobi Apr 26 '16 at 10:10
  • I updated the original question to add the /etc/hosts content. But all the entries are there. – wiwa1978 Apr 26 '16 at 12:53
  • Ok, but is the `127.0.1.1` entry still there? And have you restarted the services? If so, what's in the logs? – Tobi Apr 26 '16 at 14:22