1

I'm running a bunch of services in dockers in Mesos(v0.22.1) via Marathon (v0.9.0) and sometimes Mesos killing tasks. Usually it happens for multiple services at once

Log line related to this issue from mesos-slave.ERROR log:

Failed to update resources for container 949b1491-2677-43c6-bfcf-bae6b40534fc 
of executor production-app-emails.15437359-a95e-11e5-a046-e24e30c7374f running task production-app-emails.15437359-a95e-11e5-a046-e24e30c7374f 
on status update for terminal task, 
destroying container: Failed to determine cgroup for the 'cpu' subsystem: 
Failed to read /proc/21292/cgroup: 
Failed to open file '/proc/21292/cgroup': No such file or directory
Ihar Krasnik
  • 2,499
  • 2
  • 13
  • 17

1 Answers1

4

I'd strongly suggest to update your stack. Mesos 0.22.1 and Marathon 0.9.0 are quite outdated as of today. Mesos 0.26.0 and Marathon 0.13.0 are out.

Concerning your problem, have a look at

The first one suggests fixes on the Mesos side (post 0.22.1), and the second indicates a lack of resources of the started containers.

Maybe try to increase the RAM for the specific containers, and if that doesn't help, update the Mesos stack IMHO.

Tobi
  • 31,405
  • 8
  • 58
  • 90
  • Thanks for your advice. I am running Marathon on production, is there any simple way to upgrade Mesos and Marathon? – Ihar Krasnik Dec 23 '15 at 16:51
  • Have a look at the official docs: http://mesos.apache.org/documentation/latest/upgrades/ I think in you case an upgrade means quite some things to keep in mind, and I'd suggest that you try this in a test environment first (if you have any)... – Tobi Dec 24 '15 at 09:55
  • 1
    Had this problem and my task got killed and then updated hence generating the cgroup error. That's just a false positive i.e. not root cause of problem (lack of allocated resources to container was my problem). – Martin Tapp Apr 19 '16 at 12:38