I deployed a number of services using Marathon framework in dockers (via mesosphere) and sometimes Marathon kills running tasks.
Services use HTTP health checks (intervalSeconds = 30, maxConsecutiveFailures = 3, timeoutSeconds = 20
).
It happens randomly and sometimes I even can see when task turns red at Marathon UI even so http check works well in browser (so service is healthy) and then Marathon kills and restart service that impacts overall system performance.
Any advice will be helpful
Mesos (v0.22.1), Marathon (v0.9.0)
Logs:
I1223 12:23:45.058763 32718 slave.cpp:1581] Asked to kill task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e
30c7374f of framework 20150527-135958-3712123914-5050-2238-0000
I1223 12:23:45.189750 32720 slave.cpp:2531] Handling status update TASK_KILLED (UUID: 09e76bce-f24c-4999-8933-270baf023c62
) for task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e30c7374f of framework 20150527-135958-3712123914-505
0-2238-0000 from executor(1)@10.132.66.219:33503
I1223 12:23:45.214113 32718 docker.cpp:1009] Updated 'cpu.shares' to 102 at /sys/fs/cgroup/cpu/docker/9e0bc3b40ad9b37c4a0f
6133ca1316c2addd2e2c5a7941e56a4e1770d7afd3a2 for container 9dad82a5-34e1-4bf9-a641-17129464a226
W1223 12:23:45.214740 32718 docker.cpp:1021] Container 9dad82a5-34e1-4bf9-a641-17129464a226 does not appear to be a member
of a cgroup where the 'memory' subsystem is mounted
I1223 12:23:45.216114 32724 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 09e76bce-f24c-4999-89
33-270baf023c62) for task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e30c7374f of framework 20150527-135958
-3712123914-5050-2238-0000
I1223 12:23:45.216359 32724 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 09e76
bce-f24c-4999-8933-270baf023c62) for task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e30c7374f of framework
20150527-135958-3712123914-5050-2238-0000
I1223 12:23:45.221278 32720 slave.cpp:2776] Forwarding the update TASK_KILLED (UUID: 09e76bce-f24c-4999-8933-270baf023c62)
for task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e30c7374f of framework 20150527-135958-3712123914-5050
-2238-0000 to master@10.132.8.65:5050
I1223 12:23:45.222024 32720 slave.cpp:2709] Sending acknowledgement for status update TASK_KILLED (UUID: 09e76bce-f24c-499
9-8933-270baf023c62) for task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e30c7374f of framework 20150527-13
5958-3712123914-5050-2238-0000 to executor(1)@10.132.66.219:33503
I1223 12:23:45.233886 32725 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 09e76bce-f24c-499
9-8933-270baf023c62) for task prod-tracker-backend-processor.63dbfa9b-a965-11e5-a046-e24e30c7374f of framework 20150527-13
5958-3712123914-5050-2238-0000