3

I have seen multiple articles discussing blue/green deployments and they consistently involve forcing recreation of the Launch Configuration and the Autoscaling Group. For example:

https://groups.google.com/forum/#!msg/terraform-tool/7Gdhv1OAc80/iNQ93riiLwAJ

This works great in general except that the desired capacity of the ASG gets reset to the default. So if my cluster is under load then there will be a sudden drop in capacity.

My question is this: is there a way to execute a Terraform blue/green deployment without a loss of capacity?

Giulio Vian
  • 8,248
  • 2
  • 33
  • 41
CAS
  • 125
  • 1
  • 9

2 Answers2

2

I don't have a full terraform-only solution to this.

The approach I have is to run a small script to get the current desired capacity, set a variable, and then use that variable in the asg.

handle-desired-capacity:
    @echo "Handling current desired capacity"
    @echo "---------------------------------"
    @if [ "$(env)" == "" ]; then \
        echo "Cannot continue without an environment"; \
        exit -1; \
    fi
    $(eval DESIRED_CAPACITY := $(shell aws autoscaling describe-auto-scaling-groups --profile $(env) | jq -SMc '.AutoScalingGroups[] | select((.Tags[]|select(.Key=="Name")|.Value) | match("prod-asg-app")).DesiredCapacity'))
    @if [ "$(DESIRED_CAPACITY)" == '' ]; then \
        echo Could not determine desired capacity.; \
        exit -1; \
    fi
    @if [ "$(DESIRED_CAPACITY)" -lt 2 -o "$(DESIRED_CAPACITY)" -gt 10 ]; then \
        echo Can only deploy between 2 and 10 instances.; \
        exit -1; \
    fi
    @echo "Desired Capacity is $(DESIRED_CAPACITY)"
    @sed -i.bak 's!desired_capacity = [0-9]*!desired_capacity = $(DESIRED_CAPACITY)!g' $(env)/terraform.tfvars
    @rm -f $(env)/terraform.tfvars.bak
    @echo ""

Clearly, this is as ugly as it gets, but it does the job.

I am looking to see if we can get the name of the ASG as an output from the remote state that I can then use on the next run to get the desired capacity, but I'm struggling to understand this enough to make it useful.

Richard A Quadling
  • 3,769
  • 30
  • 40
  • OOI, the reason there are limits of 2 to 10 is that we never want to have 1 instance running (we run multiAZ) and 10 because if the load is over 10, we can only run upto 20 instances of the same class, so don't attempt to replace anything if it is that busy. Obviously, you can adjust this to suit your requirements. – Richard A Quadling Apr 30 '20 at 21:51
0

As a second answer, I wrapped the AWSCLI + jq into a Terraform module.

https://registry.terraform.io/modules/digitickets/cli/aws/latest

module "current_desired_capacity" {
  source            = "digitickets/cli/aws"
  assume_role_arn   = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/OrganizationAccountAccessRole"
  role_session_name = "GettingDesiredCapacityFor${var.environment}"
  aws_cli_commands  = ["autoscaling", "describe-auto-scaling-groups"]
  aws_cli_query     = "AutoScalingGroups[?Tags[?Key==`Name`]|[?Value==`digitickets-${var.environment}-asg-app`]]|[0].DesiredCapacity"
}

and

module.current_desired_capacity.result gives you the current desired capacity of the ASG you have nominated in the aws_cli_query.

Again, this is quite ugly, but the formalisation of this means you can now access a LOT of properties from AWS that are not yet available within Terraform.

This is a gentle hack. No resources are passed around and it was written purely with read-only for single scalar values in mind, so please use it with care.

As the author, I'd be happy to explain anything about this via the GitHub Issues page at https://github.com/digitickets/terraform-aws-cli/issues

Richard A Quadling
  • 3,769
  • 30
  • 40