2

I am running docker containers on mesos / marathon. I wanted to implement health checks, basically want to run a health check script. My question is, will the health check command be run on the container itself or does it run on the slave? It probably is container level since this is per application health check, so kind of obvious, but I would like to confirm it. Didn't find any relevant documentation that says where it is run.

Thanks

I did try an echo to /tmp/testfile via the command, which I see on the slave. This means it runs on the slave? Just need confirmation. Any more information is useful

SeattleOrBayArea
  • 2,808
  • 6
  • 26
  • 38

2 Answers2

6

The short answer is: it depends. Long answer below : ).

Command heath checks are run by the Mesos docker executor in your task container via docker exec. If you run your containers using the "unified containerizer", i.e., in case of docker containers without docker daemon, things are similar, with the difference there is no docker exec and Mesos executor simply enters the mnt namespace of your container before executing the command health check (see this doc). HTTP and TCP health checks are run by the Marathon scheduler hence not necessarily on the node where your container is running (unless you run Marathon at the same node with Mesos agent, which is probably you should not be doing). Check out this page.

Now starting with Mesos 1.2.0 and Marathon 1.3, there is a possibility to run so-called Mesos-native health checks. In this case, both HTTP(S) and TCP health checks run on the agent where your container is running. To make sure the container network can be reached, these checks enter the net namespace of your container.

rukletsov
  • 1,041
  • 5
  • 7
  • thanks for answering, it is helpful. Just some follow ups. I am basically running marathon with mesos. In this case I want to check if the slave is reachable from marathon, which involves checking liveliness of a separate process running on slave. I also want to check if the container is reachable. Can the health check command be used to do this. I was thinking of using pidof the slave process as a health check and thus wanted to know if in this environment, will the command be run on slave or container. – SeattleOrBayArea Dec 13 '16 at 21:02
  • If you would like to check that a container is reachable from the scheduler, then Marathon HTTP or TCP health check is a right thing to do (I assume you can express "reachable" in some HTTP request). Logically, if Marathon can reach the task, it can also reach the agent where the task is running. A question is why would you want to check that Marathon can reach the agent? What problem are you trying to solve? – rukletsov Dec 13 '16 at 22:21
  • Yeah, I can use container reachability, which is enough I think. And as you said, it implies agent is reachable if task is. Thanks again. – SeattleOrBayArea Dec 15 '16 at 00:37
0

Mesos-level health checks (MESOS_HTTP, MESOS_HTTPS, MESOS_TCP, and COMMAND) are locally executed by Mesos on the agent running the corresponding task and thus test reachability from the Mesos executor. Mesos-level health checks offer the following advantages over Marathon-level health checks:

Mesos-level health checks are performed as close to the task as possible, so they are are not affected by networking failures.

Mesos-level health checks are delegated to the agents running the tasks, so the number of tasks that can be checked can scale horizontally with the number of agents in the cluster.

Limitations and considerations

Mesos-level health checks consume extra resources on the agents; moreover, there is some overhead for fork-execing a process and entering the tasks’ namespaces every time a task is checked.

The health check processes share resources with the task that they check. Your application definition must account for the extra resources consumed by the health checks.

Mesos-level health checks require tasks to listen on the container’s loopback interface in addition to whatever interface they require. If you run a service in production, you will want to make sure that the users can reach it.

Marathon currently does NOT support the combination of Mesos and Marathon level health checks.

Example usage HTTP:

{
  "path": "/api/health",
  "portIndex": 0,
  "protocol": "HTTP",
  "gracePeriodSeconds": 300,
  "intervalSeconds": 60,
  "timeoutSeconds": 20,
  "maxConsecutiveFailures": 3,
  "ignoreHttp1xx": false
}

or Mesos HTTP:

{
  "path": "/api/health",
  "portIndex": 0,
  "protocol": "MESOS_HTTP",
  "gracePeriodSeconds": 300,
  "intervalSeconds": 60,
  "timeoutSeconds": 20,
  "maxConsecutiveFailures": 3
}

or secure HTTP:

{
  "path": "/api/health",
  "portIndex": 0,
  "protocol": "HTTPS",
  "gracePeriodSeconds": 300,
  "intervalSeconds": 60,
  "timeoutSeconds": 20,
  "maxConsecutiveFailures": 3,
  "ignoreHttp1xx": false
}

Note: HTTPS health checks do not verify the SSL certificate.

or TCP:

{
  "portIndex": 0,
  "protocol": "TCP",
  "gracePeriodSeconds": 300,
  "intervalSeconds": 60,
  "timeoutSeconds": 20,
  "maxConsecutiveFailures": 0
}

or COMMAND:

{
  "protocol": "COMMAND",
  "command": { "value": "curl -f -X GET http://$HOST:$PORT0/health" },
  "gracePeriodSeconds": 300,
  "intervalSeconds": 60,
  "timeoutSeconds": 20,
  "maxConsecutiveFailures": 3
}
{
  "protocol": "COMMAND",
  "command": { "value": "/bin/bash -c \\\"</dev/tcp/$HOST/$PORT0\\\"" }
}

Further Information: https://mesosphere.github.io/marathon/docs/health-checks.html

snukone
  • 312
  • 3
  • 10