6

We have a Docker container(Spring Boot) that runs in an ECS cluster. We run it without Elastic Load Balancing.

We want to update the service without downtime, so when the new task is up and healthy, the old task stops. We have been trying to add a health check on the task definition, however it refuses to work. I have tried these basic healthcheck commands.

[ "CMD-SHELL","exit 0" ]
[ "CMD-SHELL","exit 1" ]

I would expect the former to result in a task with a HEALTHY health status, and the latter to fail the health checks.. In both cases, the new task starts fine, with an UNKNOWN health status.

Has this anything to do with us not using an ELB? The documentation is not very good, and my Google searches have not returned anything useful.

Andrew Schulman
  • 8,811
  • 21
  • 32
  • 47
chris_fitz
  • 61
  • 1
  • 1
  • 3

4 Answers4

8

The answer by raja and edite by Andrew are slightly off for ECS/FARGATE. It's without the brackets and without the quotes:

CMD-SHELL, curl -f http://localhost/ || exit 1

That is the correct format if entering Health check information inside Task Definitions of ECS.

VALID DOCUMENTATION https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck

NOT VALID DOCUMENTATION for ECS/FARGATE https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_HealthCheck.html

nopuck4you
  • 181
  • 1
  • 1
  • 2
    but what if the container don't have port 80 exposed? for example, queue or redis container? – Ariful Haque May 14 '20 at 01:39
  • 1
    @ArifulHaque Then use a port number in your address http://localhost:1200/ for example. If there is no port exposed then you may need to expose one for health check verification. – nopuck4you Jun 05 '20 at 15:32
  • 1
    This is not a invalid format. If you use the UI to configure the helath-check, then yes, the format is like this: "CMD-SHELL, curl -f http://localhost/ || exit 1" But if you use the raw JSON configuration it is [ "CMD-SHELL", "curl -f http://localhost/ || exit 1" ] – Juliano Apr 20 '21 at 20:26
  • 1
    Why add `|| exit 1`? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed. – ErikE May 09 '22 at 20:57
6

The Given command is syntatically invalid.

it should be

[ "CMD-SHELL", "curl -f http://localhost/ || exit 1" ]
  • CMD or CMD-SHELL - to run the command with the container's default shell
  • curl -f http://localhost/ - actual command that need to be executed inside the container to validate the health check.
  • exit 1 - if the curl command fail then it will exit the shell

so you should change your command like below.

[ "CMD-SHELL", "echo hi || exit 1" ]

echo hi is the helath check command in my example you can execute any command instead of "echo hi " whihc should return exit status 0 if that runs successfully in your container.

Andrew Schulman
  • 8,811
  • 21
  • 32
  • 47
raja
  • 61
  • 2
  • 1
    Just a suggestion if [ "CMD-SHELL", "curl -f http://localhost/ || exit 1" ] does not work then localhost ip can be used. [ "CMD-SHELL", "curl -f http://127.0.0.1/ || exit 1" ] – xs2rashid Aug 15 '18 at 22:39
  • Would the double quotes cause any issue? Example: ["CMD-SHELL", "curl -s http://localhost:8080 | jq -e '.scheduler.status=="healthy' || exit 1"] – Durja Mar 11 '21 at 15:42
  • Why add `|| exit 1`? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed. – ErikE May 09 '22 at 20:58
4

If you use ecs-cli to deploy your fargate services, I found that you must upgrade to something that supports the healthcheck in the task definition. I also found that using CMD-SHELL is not required -- in fact, it breaks when you add it, wrapping your CMD-SHELL with another CMD-SHELL in the resulting json of the generated task definition (as seen in the aws console).

So what worked for me was upgrading from 1.4.0 to 1.7.0 of ecs-cli and then adding healthcheck in the ecs-params.yml file under the service:

task_definition:
  ecs_network_mode: awsvpc
  task_role_arn: arn:aws:iam::........
  task_execution_role: arn:aws:iam::........
  task_size:
    cpu_limit: 2048
    mem_limit: 4GB
  services:
    foo:
      healthcheck:
        command: ps cax | grep "[p]ython"
        interval: 30s
        timeout: 10s
        retries: 2
      essential: true
sjwoodr
  • 141
  • 1
1

For example, if your "Port mappings" is 8000:8000 for "ECS EC2" or 8000 tcp for "ECS Fargate", the "Command" of "HEALTHCHECK" is:

CMD-SHELL, curl -f http://localhost:8000/ || exit 1

Don't forget 8000 after http://localhost: in this case.

  • 1
    Why add `|| exit 1`? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed. – ErikE May 09 '22 at 20:58