This might get a bit convoluted but I'll try to simplify.
I have a CloudFormation
template setting up 3 identical EC2
machines, and using cfn-init
in the UserData
script it pulls some automation code from S3
, and runs it to set these machines up in a very product-specific high availability configuration that isn't relevant here.
It looks something like this:
"commands" : {
"0-Tester" : {
"command" : "echo \"I am OK.\" > \"d:\\test.txt\"",
"waitAfterCompletion": 5
},
"1-Pullcode" : {
"command" : "aws s3 cp s3://some-bucket/code.zip d:/code.zip > d:/s3sync.log",
"waitAfterCompletion": 5
},
"2-UnpackCode" : {
"command" : "powershell Expand-Archive -Path d:\\code.zip -DestinationPath d:\\dev",
"waitAfterCompletion": 5
},
"3-ResetLicensing" : {
"command" : "\"C:/Program Files/something/iisnodeModule/node.exe\" d:/dev/aws-automation/service.js --service Licensing.Service --action restart > d:/oxy_restart.log",
"waitAfterCompletion": 5
},
"4-RunAutomation" : {
"command" : "\"C:/Program Files/something/iisnodeModule/node.exe\" d:/dev/aws-automation/automate.js --config c:/servers.conf --all > d:/automation.log",
"waitAfterCompletion": 5
}
}
For this automation to happen, each machine needs to know the IPs or DNS names of all 3 machines (think creating a mongodb replica set, for example) and to achieve that I've got the template creating 3 DNS records in Route53
for these EC2s (in a private hosted zone) and generating these 3 predictable DNS names in a file on each EC2 instance.
That happens like so:
"config": {
"files": {
"c:\\servers.conf": {
"content": {
"role": "app",
"servers": [
{"Fn::Join": [".",["build1",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]},
{"Fn::Join": [".",["app1",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]},
{"Fn::Join": [".",["app2",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]}
],
"replSetName": { "Ref": "ReplicaSetName" },
"ecFolder": { "Ref": "ElasticubeFolder" }
}
}
},
And of course
"DNSRecordBuild1": {
"Type": "AWS::Route53::RecordSet",
"Properties": {
"HostedZoneName": {"Ref": "HostedZone"},
"Name": {
"Fn::Join": [".",["build1",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]
},
"Type": "A",
"TTL": "900",
"ResourceRecords": [{"Fn::GetAtt": ["build1","PrivateIp"]}]
}
}
To create the DNS records.
So at this point, the stack gets created perfectly - CFN
spins up the 3 instances, renders a config json
on each of them with the DNS names that will eventually exist, pulls the code from S3
, and starts running the scripts. Now the R53
records obviously depend on these instances to be up, but they are created as soon as the instance is ready, which is before the scripts start running - thus by the time the scripts refer to said DNS names, they already exist in R53
. This setup works fine.
Now, I wanted to add an ELB
to the stack, and I wanted it to be created only when the 3 machines are fully configured and ready for traffic. So I've added a DependsOn
attribute to the ELB
resource, which worked fine, and then added cfn-signal
and CreationPolicy
to each instance to make sure it's marked as done only when the automation scripts finish:
"cfn-signal.exe -e %ERRORLEVEL% --stack ", { "Ref" : "AWS::StackName" }, " --resource build1 --region ", { "Ref" : "AWS::Region" }
and
"CreationPolicy" : {
"ResourceSignal" : {
"Timeout" : "PT10M"
}
}
But immediately, this breaks the whole process - because now R53
records won't get created until the machine sends the signal
- and the machine won't send it because it's trying to run the scripts that depend on those DNS names and thus fail.
For now I've simply removed the signal
and policy
to let the ELB start up as soon as the instances are online, but that's not ideal. So the question is: how can I delay the creation of the ELB to after the scripts are done, without delaying R53 resources or creating any such dependency loops as above?
At the moment my thoughts are:
- Adding a
wait
resource to the template, signalling to it, and having theELB
resourceDependsOn
thiswait
- not sure this is even possible or what adverse effects it could have - Not creating the
ELB
viaCFN
at all, but rather have it created via theAWS CLI
on one of the machines when it can verify all other machines are ready. Really dislike this approach as it would require a lot of additional code in my automation scripts and make this resource harder to manage (ie would need to delete the ELB manually when deleting the stack) - Have each instance "signal" using an alternative method such as putting a file or flag some where (like S3) and then having a
lambda
react to it and create the ELB - but this has the same disadvantage as #2...