I'm looking for an end-to-end example of launching an AWS EMR cluster with a pyspark step and have it automatically terminate when the step is done or fails.
I've seen pieces of this explained but not one complete example.
I'm looking for an end-to-end example of launching an AWS EMR cluster with a pyspark step and have it automatically terminate when the step is done or fails.
I've seen pieces of this explained but not one complete example.
First of all you should go through the AWS documentation for EMR which provides the details of all the available APIs
https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html
There are two options which you can use to access the aws services :
1) boto3 : http://boto3.readthedocs.io/en/latest/index.html
boto3 provides you with a set of functions to control different aws services.
2) aws-cli : https://github.com/aws/aws-cli
This provides a command line client to access aws apis for different services.
You can use either of the above services for your task and have good documentation.
As far as emr is concerned, you can refer following specific documents:
http://boto3.readthedocs.io/en/latest/reference/services/emr.html
https://github.com/aws/aws-cli/tree/develop/awscli/examples/emr
Try out some these APIs and feel free to ask for help if you get stuck somewhere.