How to manage old sandboxes (and clean them up after jobs are no longer running)?

Question

iam new to mesos/marathon, I have a cluster of 5 mesos slaves with one master. The jobs are placed into mesos slaves, the space in /var/lib/mesos/slaves/../executors start increasing when the task fails and tries to deploy it again and again.

backend_gig.42c25d62-2f07-11e7-9b48-025317f685e8             
backend_kw-subscribe.d8bbfff0-2f09-11e7-9b48-025317f685e8
backend_gig.5fb8ab00-2f01-11e7-9b48-025317f685e8             
backend_kw-subscribe.d9d9c645-2f01-11e7-9b48-025317f685e8
backend_gigya.7218ec06-2f04-11e7-9b48-025317f685e8           
backend_kw-subscribe.f7c1bb09-2f05-11e7-9b48-025317f685e8
backend_gigya.97960c51-2f03-11e7-9b48-025317f685e8           
backend_kw-subscribe.fc36ac17-2f06-11e7-9b48-025317f685e8
backend_gig.9e4a9ab7-2f09-11e7-9b48-025317f685e8             
backend_charging-mock.3fcf883a-2e56-11e7-8876-025317f685e8
backend_gig.ac4c9a67-2f06-11e7-9b48-025317f685e8

How do I remove the directories of the jobs which are not running/failed/older jobs on mesos slaves? Would that be controlled by mesos/marathon? I should set up a cron or some script to remove the directories. Please suggest as the directories eat up much disk space and slaves go down and unable to start any tasks

score 1 · Accepted Answer · answered May 03 '17 at 20:46

Mesos has it's own system to handle old sandboxes clean up.

From documentation:

Sandbox files are scheduled for garbage collection when:

An executor is removed or terminated.

A framework is removed.

An executor is recovered unsuccessfully during agent recovery.

NOTE: During agent recovery, all of the executor’s runs, except for the latest run, are scheduled for garbage collection as well.

Garbage collection is scheduled based on the --gc_delay agent flag. By default, this is one week since the sandbox was last modified. After the delay, the files are deleted.

--gc_disk_headroom=VALUE adjust disk headroom used to calculate maximum executor directory age. Age is calculated by: gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage)) every --disk_watch_interval duration. gc_disk_headroom must be a value between 0.0 and 1.0 (default: 0.1)

thanks @janisz that worked for me, i set only the gc_delay . would that suffice or "gc_disk_headroom" should also be set, Iam confused on using this parameter, could you explain — anudeep, May 04 '17 at 11:00
GC disk headroom applies only when disk is fully. If you have plenty of free space probably gc_delay will only count. From my experience its all about experimenting how fast executor sandboxes grows and if disk usage grows drastically then you should either reduce gc_delay or increase gc disk headroom — janisz, May 04 '17 at 11:13

How to manage old sandboxes (and clean them up after jobs are no longer running)?

1 Answers1