I am working around with marathon & mesos & docker very well, but it recently discovered a problem.when mesos-slave encounter an Exception , the state of task on Marathon will change to TASK_LOST , and the task can not be killed only after about 15mins.
I did a test by manually Reboot My Operation System that run mesos-slave service and docker and run the task, and then the task state shown in Marathon UI became to " Unscheduled(100%) " ,and the task can not be killed automatically either manually, until past about 15 minutes. My question is how to reduce this time? I tried to add marathon startup command line args with
task_launch_confirm_timeout=30000
scale_apps_interval = 30000
task_lost_expunge_initial_delay = 30000
task_launch_timeout = 30000
and add mesos-slave startup command line args with
recovery_timeout=1mins
but it doesn't work for me.