0

This might get a bit convoluted but I'll try to simplify.

I have a CloudFormation template setting up 3 identical EC2 machines, and using cfn-init in the UserData script it pulls some automation code from S3, and runs it to set these machines up in a very product-specific high availability configuration that isn't relevant here.

It looks something like this:

"commands" : {
          "0-Tester" : {
            "command" : "echo \"I am OK.\" > \"d:\\test.txt\"",
            "waitAfterCompletion": 5
          },
          "1-Pullcode" : {
            "command" : "aws s3 cp s3://some-bucket/code.zip d:/code.zip > d:/s3sync.log",
            "waitAfterCompletion": 5
          },
          "2-UnpackCode" : {
            "command" : "powershell Expand-Archive -Path d:\\code.zip -DestinationPath d:\\dev",
            "waitAfterCompletion": 5
          },
          "3-ResetLicensing" : {
            "command" : "\"C:/Program Files/something/iisnodeModule/node.exe\" d:/dev/aws-automation/service.js --service Licensing.Service --action restart > d:/oxy_restart.log",
            "waitAfterCompletion": 5
          },
          "4-RunAutomation" : {
            "command" : "\"C:/Program Files/something/iisnodeModule/node.exe\" d:/dev/aws-automation/automate.js --config c:/servers.conf --all > d:/automation.log",
            "waitAfterCompletion": 5
          }
        }

For this automation to happen, each machine needs to know the IPs or DNS names of all 3 machines (think creating a mongodb replica set, for example) and to achieve that I've got the template creating 3 DNS records in Route53 for these EC2s (in a private hosted zone) and generating these 3 predictable DNS names in a file on each EC2 instance.

That happens like so:

"config": {
        "files": {
          "c:\\servers.conf": {
            "content": {
              "role": "app",
              "servers": [
                {"Fn::Join": [".",["build1",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]},
                {"Fn::Join": [".",["app1",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]},
                {"Fn::Join": [".",["app2",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]}
              ],
              "replSetName": { "Ref": "ReplicaSetName" },
              "ecFolder": { "Ref": "ElasticubeFolder" }
            }
          }
        },

And of course

"DNSRecordBuild1": {
  "Type": "AWS::Route53::RecordSet",
  "Properties": {
    "HostedZoneName": {"Ref": "HostedZone"},
    "Name": {
      "Fn::Join": [".",["build1",{"Ref": "AWS::StackName"},{"Ref": "HostedZone"}]]
    },
    "Type": "A",
    "TTL": "900",
    "ResourceRecords": [{"Fn::GetAtt": ["build1","PrivateIp"]}]
  }
}

To create the DNS records.

So at this point, the stack gets created perfectly - CFN spins up the 3 instances, renders a config json on each of them with the DNS names that will eventually exist, pulls the code from S3, and starts running the scripts. Now the R53 records obviously depend on these instances to be up, but they are created as soon as the instance is ready, which is before the scripts start running - thus by the time the scripts refer to said DNS names, they already exist in R53. This setup works fine.

Now, I wanted to add an ELB to the stack, and I wanted it to be created only when the 3 machines are fully configured and ready for traffic. So I've added a DependsOn attribute to the ELB resource, which worked fine, and then added cfn-signal and CreationPolicy to each instance to make sure it's marked as done only when the automation scripts finish:

"cfn-signal.exe -e %ERRORLEVEL% --stack ", { "Ref" : "AWS::StackName" }, " --resource build1 --region ", { "Ref" : "AWS::Region" }

and

"CreationPolicy" : {
    "ResourceSignal" : {
      "Timeout" : "PT10M"
    }
  }

But immediately, this breaks the whole process - because now R53 records won't get created until the machine sends the signal - and the machine won't send it because it's trying to run the scripts that depend on those DNS names and thus fail.

For now I've simply removed the signal and policy to let the ELB start up as soon as the instances are online, but that's not ideal. So the question is: how can I delay the creation of the ELB to after the scripts are done, without delaying R53 resources or creating any such dependency loops as above?

At the moment my thoughts are:

  1. Adding a wait resource to the template, signalling to it, and having the ELB resource DependsOn this wait - not sure this is even possible or what adverse effects it could have
  2. Not creating the ELB via CFN at all, but rather have it created via the AWS CLI on one of the machines when it can verify all other machines are ready. Really dislike this approach as it would require a lot of additional code in my automation scripts and make this resource harder to manage (ie would need to delete the ELB manually when deleting the stack)
  3. Have each instance "signal" using an alternative method such as putting a file or flag some where (like S3) and then having a lambda react to it and create the ELB - but this has the same disadvantage as #2...
motig88
  • 157
  • 1
  • 8

1 Answers1

2

How about attacking it from a completely different angle and use AWS EC2 System Manager (SSM) to configure the instances after all the resources (i.e. 3x EC2, DNS and ELB) are created?

The SSM will depend on the creation of all the above and then login to each instance and start up the app. That should resolve the dependency problems.

Check out AWS::SSM::Association and AWS::SSM::Document for the CloudFormation support of EC2 Systems Manager.

Hope that helps :)

MLu
  • 24,849
  • 5
  • 59
  • 86
  • Hey, I've started going down that path for a different purpose (automating processes post setup), got SSM to work on my instances and sometime in the near future I'll consider switching the deployment to SSM as well. If/once I do I'll mark this as accepted - just wish there was a better way as *that's what cfn-init is for*.. Thanks again, m8. – motig88 Aug 07 '17 at 12:51