taskKillGracePeriodSeconds is not working for DC/OS Marathon Application?

Question

We have setup DC/OS(version 1.9) Cluster on AWS nodes. We are creating Marathon Application definition with setting "taskKillGracePeriodSeconds"=60. We are also catching SIGTERM in our application to handle the application shutdown gracefully. But this is is not wroking, Marathon is immediately killing the Application (on Scale Down / Destroy) and not waits for 60 secs as expected. We are getting callback on SIGTERM but application killed immediately after that. We have also tried with starting Mesos slave agents with setting following attributes in file /var/lib/dcos/mesos-slave-common MESOS_ATTRIBUTES=executor_shutdown_grace_period:60secs;docker_stop_timeout:60s ecs but this is also not helping.

DCOS Cluster Agents uses centos-release-7-2.1511.el7.centos.2.10.x86_64 OS.

Does anybody able to use taskKillGracePeriodSeconds successfully.?

Please help to work out this.

Thanks.

score 1 · Accepted Answer · answered Apr 12 '17 at 12:58

1

are you using Docker containers?

There was a problem as far as I remember when using process groups (=containers) with the forwarding of the SIGTERM signal.

Just to test this on your cluster, can you deploy an app with the following command, just using mesos containerizer and a taskKillGracePeriodSeconds of 10 seconds?

trap "echo ' killing' && sleep 5 && echo 'test' && sleep 100" SIGTERM && sleep 100000

answered Apr 12 '17 at 12:58

unterstein

79
5

Yes, we are using Docker Containers. We are running Java application inside Docker container and we are able to catch the Termination Signal. We have added Runtime.addShutdownHook hook in our application and we are getting callback here when application is killed.But after this application terminated immediately as Marathon stops Docker container. We have also tried the simple Marathon application which uses mesos containerizer. taskKillGracePeriodSeconds timeout is working for it. Is there any way to handle the graceful shutdown of Marathon application runs Docker Container..? – Sachin Apr 13 '17 at 06:47
1

The problem with docker is, that it depends how you use it. The best way to make sure, that the sigterm is send to the underlying process is, that you use the `entrypoint` functionality within your Dockerfile and don`t use marathon `cmd` section therefore. If changing to entrypoint does not help in your case, you can post your (stripped down) Dockerfile and your (stripped down) marathon app definition and I will try to reproduce it :) – unterstein Apr 13 '17 at 13:35
With adding entrypoint for container, we are able to catch SIGTERM in java process which we are running in container and also able to handle shutdown of process gracefully. We are also able to use taskKillGracePeriodSeconds for the same. Thanks for your help. – Sachin Apr 18 '17 at 04:22
This is great news! It would be awesome if you could mark this answer as solving your problem <3 – unterstein Apr 18 '17 at 08:09

taskKillGracePeriodSeconds is not working for DC/OS Marathon Application?

1 Answers1