Background:
I have an existing blue/green architecture setup that uses two load balancers (blue/green), and does swapping based on Route53 config. Each set of applications in these load balancers have their own distinct blue/green target groups, and their own security groups (shared for blue/green).
For various reasons, I'm looking into changing this setup to use a single load balancer, but just swap the target groups.
I'm using the CDK to do this. I'm re-using the existing security groups for the new instances in this load balancer. When I deploy the CloudFormation, it confirms that I want to modify the security groups to allow ingress from the new load balancer. Make sense; yes, I do.
Here is a snippet from the resulting cloudformation for the AWS::EC2::SecurityGroupIngress
addition.
Resources:
somerandomnamebythecdk:
Type: AWS::EC2::SecurityGroupIngress
Properties:
IpProtocol: tcp
Description: Load balancer to target
FromPort: 80
GroupId: sg-XXXXX
SourceSecurityGroupId: sg-YYYYYY
ToPort: 80
Metadata:
aws:cdk:path: somerandomnamebythecdkformetadata:80
Where sg-XXXXX
was a parameter to the CDK and sg-YYYYYY
is the SG of the ALB, which is also managed outside of this particular CloudFormation. The SG itself is not a part of this Cloudformation.
So then a "deployment" via the CDK could be changing the CloudFormation template to swap the blue/green target groups that are in use by the load balancer. I'm using a test-listener port (8443). So whichever target group is "secondary" gets wired into that listener. If I'm totally shutting down one side, then I map both the prod and test listener (443/8443) to the same (active) target group. This kind of mimics what happens with an ECS Blue/Green Deployment. However, in my case, I'm doing this with EC2 instances/AMIs.
Issue
When I tear down this cloudformation, it correctly undoes the AWS::EC2::SecurityGroupIngress
addition. However it also appears to remove the existing AWS::EC2::SecurityGroupIngress
rules from the current blue/green load balancers, which then makes those applications unreachable/unhealthy in the blue/green ALBs because they can't get a successful healthcheck on port 80.
I can tell in CloudTrail that there is a RevokeSecurityGroupIngres
event happening. However, its running for one of the blue/green ALBs instead-of/in-addition-to the new/single ALB.
Here is a sanitized version of that event:
{
"eventVersion": "1.05",
"userIdentity": {
"type": "AssumedRole",
"principalId": "ACCOUNT:MYUSER",
"arn": "arn:aws:sts::999999999999:assumed-role/ROLE/MYUSER",
"accountId": "999999999999",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "ACCOUNT",
"arn": "arn:aws:iam::999999999999:role/ROLE",
"accountId": "999999999999",
"userName": "ROLE"
},
"webIdFederationData": {},
"attributes": {
"mfaAuthenticated": "false",
"creationDate": "2020-11-18T15:39:08Z"
}
},
"invokedBy": "cloudformation.amazonaws.com"
},
"eventTime": "2020-11-18T18:14:02Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "RevokeSecurityGroupIngress",
"awsRegion": "us-east-1",
"sourceIPAddress": "cloudformation.amazonaws.com",
"userAgent": "cloudformation.amazonaws.com",
"requestParameters": {
"groupId": "sg-XXXXX",
"ipPermissions": {
"items": [
{
"ipProtocol": "tcp",
"fromPort": 80,
"toPort": 80,
"groups": {
"items": [
{
"groupId": "sg-ZZZZZ"
}
]
},
"ipRanges": {},
"ipv6Ranges": {},
"prefixListIds": {}
}
]
}
},
"responseElements": {
"requestId": "GUID1",
"_return": true,
"unknownIpPermissionSet": {}
},
"requestID": "GUID1",
"eventID": "GUID2",
"eventType": "AwsApiCall",
"recipientAccountId": "999999999999"
}
Question
Is there is something I'm doing wrong that causes the tear down to delete ingress rules that aren't associated to the CloudFormation?
Has anyone seen this before?