Spawn new Aws::AutoScalingGroup instances before old are destroyed. (503 error occuring)

Question

TL;DR: See edit at bottom.

I am attempting set up continuous deployments for our new environment we are migrating to at my company.

I am using an aws cloudformation stack to contain all of my infrastructure. When I create the stack, my instances initiliaze correctly through their AWS::AutoScaling::LaunchConfiguration setup and specifically - AWS::CloudFormation::Init. (This is within my cloudformation JSON template). That launch config/init script will pull down a docker container of my application & run it on ec2 instances within my AWS::AutoScaling::AutoScalingGroup.

When I commit to my repo, I have a .travis-ci.yml file setup to run an aws update-stack command. This will make it so any new instances that are brought into my stack will have the newest version of my docker running when the init script runs.

How do I invalidate my current stack instances and bring in my new stack instances with 0 downtime?

Right now, as stated in my question, I receive a 503 error. This occurs during the time period when my old instances become invalid and my new instances are "warming up".

I would like my new instances to warm up and be inaccessible, then once they are warm and ready add them, then remove the old ones.

Here is what I'm doing currently to run into this problem:

aws cloudformation update-stack \
    --stack-name <stack-name> \
    --template-body file://<template-file>.json \
    --profile <my-profile> \
    --parameters <params>

Then, either:

# This rans for each INSTANCE_ID in the current stack.
aws autoscaling set-instance-health \
    --profile <my-profile> \
    --instance-id ${INSTANCE_ID} \
    --health-status Unhealthy \
    --no-should-respect-grace-period

Or:

aws autoscaling detach-instances \
    --auto-scaling-group-name <auto-scaling-group-name> \
    --no-should-decrement-desired-capacity \
    --profile <my-profile> \
    --instance-ids <instance-1> <instance-2>

Any insight o how I can eliminate downtime when swapping autoscaling group instances would be appreciated!

I would also be open to creating instances and then adding them to the autoscaling group through the attach-instances command. However I'm unaware how to provision these instances with the pre-existing AWS::AutoScaling::LaunchConfiguration and I want to keep my process DRY and not repeat that functionality twice.

Thanks for the help!

EDIT:

I found a direct solution for replacing EC2 instances within my autoscaling group. Directly from the aws documentation:

The AutoScalingReplacingUpdate and AutoScalingRollingUpdate policies apply only when you do one or more of the following:

Change the Auto Scaling group's AWS::AutoScaling::LaunchConfiguration.
Change the Auto Scaling group's VPCZoneIdentifier property
Change the Auto Scaling group's LaunchTemplate property
Update an Auto Scaling group that contains instances that don't match the current LaunchConfiguration.

I realized that the easiest solution for me was to change the name the autoscaling group with something similar to the following:

"WebServerGroup":{
    "Type":"AWS::AutoScaling::AutoScalingGroup",
    "Properties":{
        "AutoScalingGroupName": { 
            "Fn::Sub" : "MyWebServerGroup-${UniqueDockerTag}" 
        }, 
        ...
    },
    "UpdatePolicy":{
        "AutoScalingRollingUpdate":{
            "MaxBatchSize":"50",
            "MinSuccessfulInstancesPercent": 100,
            "PauseTime": "PT5M",
            "WaitOnResourceSignals": true
        },
        "AutoScalingReplacingUpdate" : {
            "WillReplace" : "true"
        }

     }
}

The ${UniqueDockerTag} is a parameter passed in to my template and it is unique to each build so for my use case it was an easy and efficient solution.

A new AutoScalingGroup is created and once it has finished creating, the old AutoScalingGroup is deleted. This is all done with 0 downtime.

Hope this helps! Cheers!

MLu · Answer 1 · 2018-11-30T21:02:44.217

What you are trying to do looks like a cross between Rolling deployment and Blue-Green deployment.

If I were you I would consider a couple of other options before trying to fix your specific issue.

1. Use ECS (or EKS) Cluster

Instead of managing an AutoScaling Group where each instance actively pulls the container, and replacing the EC2 instances to deploy the new releases you should consider using an ECS Cluster and ECS Services.

ECS Cluster is where you run your containers. ECS Cluster is also an AutoScaling Group of EC2 instances, but rather than actively pulling your container image they join the ECS Cluster and wait for instructions what to do.

That's where ECS Services come in - ECS Service describes what you want to run, i.e. your container definition, parameters, etc. It then schedules the containers (ECS Tasks) on the available ECS Cluster nodes.

Deploying a new version of your app is as simple as updating the ECS Service definition - it can be done as a Rolling Update, all-in-one, etc. It seamlessly integrates with ALB, ELB, etc and you can certainly achieve zero-downtime releases.

Using ECS will remove the need to replace the EC2 instances with every container release and only replace the actual containers.

2. Proper Blue-Green deployment

Another option is a proper blue-green deployment where you build a complete new environment and then switch the traffic usually on the DNS level.

That means your CloudFormation template for each release will contain a complete infrastructure (ASG, LaunchConfig, ALB, ...) and you end up with two instances of the stack - e.g. app-blue and app-green. When Blue is active you're free to tear down and re-deploy Green. Test it, and once happy switch the DNS from Blue ALB to Green ALB. With next release you repeat the same for Blue.

Again the benefit of this is that you've got an easy roll-back path (simply switch the DNS back to Blue ALB if Green deployment turns out to be broken) and again it allows zero-downtime.

Update: just announced on AWS re:Invent 2018 - Blue/Green deployment support for ECS. This new functionality may facilitate your ECS Service releases without having to build a complete environment every time.

Hope that helps :)

This is the second AWS question in the last month or so you've answered in depth. I owe you a beer or coffee of some sort - feel free to send over a BTC/ETH/LTC address. Seriously, thank you for the insight, it is very appreciated. I think that the ECS route describes a good concept of what I'm after. Blue-Green sounds nice but I think it's overkill for what I'm looking to achieve, I have other services that need to persist when I update my stack as mentioned, and that adds another layer of unnecessary complexity. Thanks MLu! - Cheers, Dominic. — domdambrogia, Nov 30 '18 at 18:51
@domdambrogia no worries I'm glad I could help! Simply upvote and eventually accept my answers here on SF, that's a good enough "thank you" :) I'm working towards earning a silver badge for [`amazon-web-services`](https://serverfault.com/search?tab=votes&q=user%3a122588%20%5bamazon-web-services%5d) tag so every upvote helps ;) BTW I have updated the answer with a link to just announced functionality for ECS Blue/Green deployments. — MLu, Nov 30 '18 at 21:23