AWS ECS restart Service with the same task definition and image with no downtime

Question

I am trying to restart an AWS service (basically stop and start all tasks within the service) without making any changes to the task definition.

The reason for this is because the image has the latest tag attached with every build.

I have tried stopping all tasks and having the services recreate them but this means that there is some temporarily unavailable error when the services are restarting in my instances (2).

What is the best way to handle this? Say, A blue-green deployment strategy so that there is no downtime?

This is what I have currently. It'shortcomings is that my app will be down for a couple of seconds as the service's tasks are being rebuilt after deleting them.

configure_aws_cli(){
    aws --version
    aws configure set default.region us-east-1
    aws configure set default.output json
}

start_tasks() {
    start_task=$(aws ecs start-task --cluster $CLUSTER --task-definition $DEFINITION --container-instances $EC2_INSTANCE --group $SERVICE_GROUP --started-by $SERVICE_ID)
    echo "$start_task"
}

stop_running_tasks() {
    tasks=$(aws ecs list-tasks --cluster $CLUSTER --service $SERVICE | $JQ ".taskArns | . []");
    tasks=( $tasks )
    for task in "${tasks[@]}"
    do
        [[ ! -z "$task" ]] && stop_task=$(aws ecs stop-task --cluster $CLUSTER --task "$task")
    done
}

push_ecr_image(){
    echo "Push built image to ECR"
    eval $(aws ecr get-login --region us-east-1)
    docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/repository:$TAG
}

configure_aws_cli
push_ecr_image
stop_running_tasks
start_tasks

Ben Whaley · Answer 1 · 2020-09-29T20:38:34.777

74

Use update-service and the --force-new-deployment flag:

aws ecs update-service --force-new-deployment --service my-service --cluster cluster-name

edited Sep 29 '20 at 20:38

answered May 23 '18 at 21:18

Ben Whaley

32,811
7
87
85

If the task definition doesn't change, the `update-service` does nothing, even with `--force-new-deployment`. Look at responses below, you have to switch to another task definition: https://stackoverflow.com/a/42794838/1121497 or https://stackoverflow.com/a/42798106/1121497. – Ferran Maylinch Dec 14 '19 at 18:07
15

No, that's not correct. `--force-new-deployment` starts 2 new tasks, registers them in the ALB target group, deregisters the previous tasks, drains connections on the previous tasks, and stops them. I've just confirmed it. – Ben Whaley Jan 10 '20 at 16:18
I got `An error occurred (ClusterNotFoundException) when calling the UpdateService operation: Cluster not found.` – Xin Apr 30 '20 at 00:56
4

@Xin you need to mentioned cluster name `aws ecs update-service --force-new-deployment --service test --profile test --region us-east-2 --cluster stage-test`, note the above command just restart the service without changing task definition, this is good when you always use `latest` or static tag and revision – Adiii May 31 '20 at 01:29
@BenWhaley is correct. I've done the same. – Scott McAllister Jan 14 '21 at 19:23
Clickops version? – Szczepan Hołyszewski Oct 03 '22 at 14:50
2

Select the service you want to restart in the ECS console. Go to Update => check Force new deployment and do not make any other changes. Continue through the screens until you complete the process. – Ben Whaley Oct 03 '22 at 15:54

score 16 · Answer 2 · edited Feb 02 '22 at 06:12

Hold on a sec. If I understood you usecase correctly, this is addressed in the official docs:

If your updated Docker image uses the same tag as what is in the existing task definition for your service (for example, my_image:latest), you do not need to create a new revision of your task definition. You can update the service using the procedure below, keep the current settings for your service, and select Force new deployment....

To avoid downtime, you should manipulate 2 parameters: minimum healthy percent and maximum percent:

For example, if your service has a desired number of four tasks and a maximum percent value of 200%, the scheduler may start four new tasks before stopping the four older tasks (provided that the cluster resources required to do this are available). The default value for maximum percent is 200%.

This basically mean, that regardless of whether your task definition changed and to what extent, there can be an "overlap" between the old and the new ones, and this is the way to achieve resilience and reliability.

UPDATE: Amazon has just introduced External Deployment Controllers for ECS(both EC2 and Fargate). It includes a new level of abstraction called TaskSet. I haven't tried it myself yet, but such fine grain control over service and task management(both APIs are supported) can potentially solve the problem akin this one.

score 6 · Answer 3 · answered Mar 14 '17 at 19:25

After you push your new image to your Docker repository, you can create a new revision of your task definition (it can be identical to the existing task definition) and update your service to use the new task definition revision. This will trigger a service deployment, and your service will pull the new image from your repository.

This way your task definition stays the same (although updating the service to a new task definition revision is required to trigger the image pull), and still uses the "latest" tag of your image, but you can take advantage of the ECS service deployment functionality to avoid downtime.

score 3 · Answer 4 · answered Mar 14 '17 at 22:58

3

The fact that I have to create a new revision of my task definition every time even when there is no change in the task definition itself is not right.

There are a bunch of crude bash implementations on this which means that AWS should have the ECS service scheduler listen for changes/updates in the image, especially for an automated build process.

My crude work-around to this was have two identical task definitions and switch between them for every build. That way I don't have redundant revisions.

Here is the specific script snippet that does that.

update_service() {
    echo "change task definition and update service"
    taskDefinition=$(aws ecs describe-services --cluster $CLUSTER --services $SERVICE | $JQ ".services | . [].taskDefinition")
    if [ "$taskDefinition" = "$TASK_DEF_1" ]; then
        newDefinition="$TASK_DEF_2"
    else
        newDefinition="$TASK_DEF_1"
    fi
    rollUpdate=$(aws ecs update-service --cluster $CLUSTER --service $SERVICE --task-definition $newDefinition)
}

answered Mar 14 '17 at 22:58

John Kariuki

4,966
5
21
30

does aws now have this kind of engine? – Elaine Jun 21 '17 at 05:29
why are you redeploying if nothing has changed? what changed? – Aug 13 '19 at 20:51
The code changed. Not the environment/configuration. – John Kariuki Aug 21 '19 at 13:24
what is the variable $JQ here? Edit - ~$JQ~ `jq` is a command line utility to parse json output. – Rohit Kaushal Apr 28 '23 at 11:07
a minor correction here is that the jq command returns string in quotes so you may need to compare like this - `"$taskDefinition" = "\"$TASK_DEF_1\""` if you have defined without quotes. – Rohit Kaushal Apr 28 '23 at 12:14

score -2 · Answer 5 · answered May 08 '19 at 20:14

Did you have this question solved? Perhaps this will work for you.

With a new release image pushed to ECR with a version tag, i.e. v1.05, and the latest tag, the image locator in my task definition needed to be explicitly updated to have this version tag postfixed like :v1.05.

With :latest, this new image did not get pulled by the new container after aws ecs update-service --force-new-deployment --service my-service.

I was doing tagging and pushing like this:

docker tag ${imageId} ${ecrRepoUri}:v1.05
docker tag ${imageId} ${ecrRepoUri}:latest
docker push ${ecrRepoUri}

...where as this is the proper way of pushing multiple tags:

docker tag ${imageId} ${ecrRepoUri}
docker push ${ecrRepoUri}:v1.05
docker push ${ecrRepoUri}:latest

This was briefly mentioned in the official docs without a proper example.

score -4 · Answer 6 · answered Aug 14 '17 at 09:18

-4

Works great https://github.com/fdfk/ecsServiceRestart

python ecsServiceRestart.py restart --services="app app2" --cluster=test

answered Aug 14 '17 at 09:18

user8458170

1

3

Please copy essential parts of link content here – miracle_the_V Aug 14 '17 at 09:25

score -6 · Answer 7 · answered Mar 19 '18 at 13:45

-6

The quick and dirty way:

login to EC2 instance running the task
find your container with docker container list
use docker restart [container]

answered Mar 19 '18 at 13:45

yurez

2,826
1
28
22

1

This is not a solution nor is this programmable in any reliable way There are aws-cli commands to handle graceful switchover – Ste Mar 03 '21 at 16:14

AWS ECS restart Service with the same task definition and image with no downtime

7 Answers7