1

I am trying to set up some easy code to run when trying to spin up an EMR for some ad hoc work I have to do, time to time.

Right now I pass the 'aws emr create-cluster' command and then find the DNS in the console, once the cluster is created to then use ssh to connect.

I'd like to skip having to open the console at all, and use the cluster ID to get the DNS value to create my SSH connection, but I am not seeing a clear command to do this with. I'm new to CLI so I imagine this is a simple task I am merely failing at figuring out myself.

In my mind the solution should be something along the lines of

aws emr create-cluster [config for cluster here] > file.txt
set DNS = aws emr describe-cluster --cluster-id file.txt -MasterPublicDnsName
ssh -i Desktop/AWS/EMRKey.pem -o ServerAliveInterval=15 hadoop@$DNS

probably will have to append 'hadoop@' to the DNS variable before passing it into a command, but I'm more curious at the moment to if the above makes any functional sense, and if so, how I can get the functionality of the describe-cluster command to output the -MasterPublicDnsName, as that is obviously just something I made up and not an actual option that I have found.

Dick McManus
  • 739
  • 3
  • 12
  • 26

1 Answers1

2

The AWS CLI has a query option that lets you query the output of a command. You'll also want to use a waiter to make sure the instance is up before you try to connect to it.

You could simply run

cluster_id="j-2RNBSZZBLXTZ0"
aws emr wait cluster-running --cluster-id $cluster_id
hostname=`aws emr describe-cluster --output text --cluster-id $cluster_id --query Cluster.MasterPublicDnsName`
ssh hadoop@$hostname

That should work!

Randall Hunt
  • 12,132
  • 6
  • 32
  • 42
  • 1
    Thank you! Those are exactly the kind of items I was looking for. This will help a lot, I had been just setting a timeout count in the meantime to handle the startup problem. Was going to tackle that one next, I definitely appreciate the suggestion. – Dick McManus Apr 16 '19 at 19:37