2

I am building a mesos cluster from scratch (using Vagrant, which is not relevant for this issue). OS: Ubuntu 16.04 (trusty) Setup:

  • Master -> Runs ZooKeeper, Mesos-master, Marathon and Chronos
  • Slave -> Runs Mesos-slave

This is my provisioning script for the master node https://github.com/zeitgeist2018/infrastructure/blob/fix-marathon/provision/scripts/install-master.sh.

I have managed to register de slave into Mesos, install Marathon and Chronos frameworks, and run scheduled jobs in Chronos (both with docker and shell commands), but I can't get Marathon to work properly. The UI gets stuck in "Loading applications" as soon as I open it, and when I try to call the API, the request hangs forever with no response. In the API I tried to get simple marathon information and do deployments, both with the same hanging result. I've been checking Marathon logs but I don't see anything error there. Just a couple of logs that may (or not) be a hint:

[2020-03-08 10:33:21,819] INFO  Prompting Mesos for a heartbeat via explicit task reconciliation (mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marathon-akka.actor.default-dispatcher-6)
[2020-03-08 10:33:21,822] INFO  Received fake heartbeat task-status update (mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-87)
[2020-03-08 10:33:25,957] INFO  Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-7)

enter image description here

Cristian
  • 370
  • 3
  • 8
  • Can you share more logs and output from `/v2/app` and `/info`? – janisz Mar 09 '20 at 12:59
  • I already fixed the issue, you can see it in my answer below – Cristian Mar 09 '20 at 16:19
  • Downgrading is not a fix. You may miss important updates in the future – janisz Mar 09 '20 at 18:09
  • I agree, but since this cluster I'm building is just for fun and downgrading makes it work, I consider it a fix. Anyway, what do you have in mind in order to fix the actual issue? – Cristian Mar 12 '20 at 11:16
  • Check `/v2/apps` endpoint if you can see your app there then inspect UI in browser and check network tab, I think something is blocking UI from getting responses from server. – janisz Mar 12 '20 at 12:37
  • As I mentioned in the original post, I tried to call the API directly, but it just hangs forever, not even giving an error, that's why I'm a bit lost and can't find any hint. – Cristian Mar 12 '20 at 15:22
  • Can you share more logs? – janisz Mar 17 '20 at 16:19
  • Did anyone figure this out? – SynAck Apr 11 '20 at 21:14
  • Not yet. I have this project stopped for some time while I'm working on other things. If not using the very latest version is not an issue for you, just downgrade it. – Cristian Apr 11 '20 at 21:22

3 Answers3

2

Installing jdk11 and choosing it as default fixed this issue for me without downgrading the Marathon to any other version.

in ubuntu 20.04:

sudo apt install openjdk-11-jre-headless
update-alternatives --config java
1

I increased the number of cpus, virtual machine in which the marathon was installed to 3 and the problem was solved.

0

I have managed to make it work. It was as simple as downgrading Marathon to v1.7.189. After that, it starts properly, and the API responds to requests.

Cristian
  • 370
  • 3
  • 8