Terraform with AWS and hotfix deployment strategy

Question

In a nutshell we have a platform which comprises several applications/servers. Terraform is used to manage both the AWS Infrastructure (VPC, Subnet, IGW, Security Groups, ...) and Applications deployment (utilizing Ansible as provisioner from Terraform). For each deployment Packer will build all AMIs, tag them with appropriate name so from Terraform latest AMIs will be picked up.

The process in general works but we face a dilemma when we want to deploy some small hotfixes, that could happen quite frequently as after each deployment and testing from QA some regressions could happen. So for each application that needs to be hot-fixed (may be not all apps need to be fixed), we create a hotfix branch, build the artifact (could be jar or deb pkg) - then there're 2 cases:

Either triggering Packer to build new image, tag it with the appropriate hotfix and run terraform apply.
Or, run an Ansible job to hot-deploying the application package, restart the service/application if needed.

With the first approach, we assure the Immutable Infra idea is followed, unfortunately it also caused some downsides as any small changes in Terraform configuration or Infra would case a change in terraform plan, for example we may have some changes in security group which is out of terraform state (i.e: it might be from some features regarding whitelisting some IPs), and applying tf would cancel all changes. The whole process of building AMI and run Terraform apply also quite heavy.

We're leaning more to the second approach, which is easy, but still wonder if it's a good practice?

It's entirely up to you but I'd recommend the first one because it leaves you in a good, known state. It's also basically impossible to do the second one with ASGs which would be a deal breaker for me personally. Also having Terraform go and undo changes people have manually made (such as opening up security groups outside of Terraform) seems like a good thing rather than a bad thing. — ydaetskcoR, May 03 '18 at 10:03
We are looking at exactly the same kind of scenarios. I would lean towards the second option and deploy hot fix to the application or instance configuration using Ansible as long as you are not touching and AWS resources with it. — JamesP, May 31 '18 at 09:17
Your system is in a known state as it's recorded by the playbooks, as long as the changes are incorporated into your images when other changes require creating new images. I'm not sure if you remove the SGs from Terraform and then import them would allow you to continue without Terraform rebuilding dependent parts of your infrastructure. — JamesP, May 31 '18 at 09:25
@ydaetskcoR I would disagree that it's impossible. Depending on how code is pulled in, as long as the instances have userdata that initialize by running an Ansible playbook every new host can have the latest code, and you can update pre-existing hosts manually with Ansible from a remote host. However I would agree that it's probably messy and not as advisable as keeping things immutable. — stobiewankenobi, Sep 05 '18 at 19:54

stobiewankenobi · Answer 1 · 2018-09-05T19:56:32.677

For code changes, I recommend using packer to build AMI's as a part of your CI pipeline, it can definitely be cumbersome to manage launch config changes with Terraform and ASG's given how buggy it can be but I think the result is much cleaner and safer than updating code with Ansible. You do technically have a "record" of changes given that you know your ansible playbooks and what state they are in but I think it should be driven from a CI pipeline to build immutable artifacts.

If you really wanted to stick with just Ansible you can always just bake into your userdata an Ansible playbook that always pulls in the latest code from Master (or whatever). This ensures new hosts come up with latest code, and you can manually invoke Ansible against pre-existing hosts. Or you can just rotate ec2 instances to update code by doubling desired capacity and scaling back down once new are healthy. This can all be highly automated and would give you a pseudo canary deployment. Again though I'd recommend sticking with immutable builds.

Out of curiosity any reason you're not using docker? I'm sure you have a good business reason, but moving to docker simplifies a lot of this as well, as it's much much easier to build a docker container and update an ECS task definition, than deploy an entirely new AMI/EC2 Instance.

+1 for mentioning Docker. I've faced the same dilemma as @Arcobaleno and in the end dockerizing my services was the answer. — cfelipe, Jul 07 '19 at 22:47

Terraform with AWS and hotfix deployment strategy

1 Answers1