I've been doing some preliminary HA testing with marathon and can't get the recovery time below 1.5 minutes. As I am running Mesos version 0.22.1, I did not set the checkpoint flag and setting executor_registration_timeout to 10s and 15s does not seem to improve the fail-over time. Are there other parameters that I need to configure in Mesos/Marathon to achieve faster recovery?
Cheers,
Asked
Active
Viewed 138 times
1

Nastooh
- 123
- 4
-
Could you provide the flags you start mesos master and slave? – haosdent Aug 13 '15 at 17:54
-
Problem seems to be related to 0.22, where slave_ping_timeout=15 and max_slave_ping_timeouts=5 are hardcoded. Kindly, see "Mesos Slave Failover time" email thread in user@mesos.apache.org. Cheers, – Nastooh Aug 14 '15 at 18:38