-1

I'm looking for an end-to-end example of launching an AWS EMR cluster with a pyspark step and have it automatically terminate when the step is done or fails.

I've seen pieces of this explained but not one complete example.

Fred R.
  • 557
  • 3
  • 7
  • 16

1 Answers1

0

First of all you should go through the AWS documentation for EMR which provides the details of all the available APIs

https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html

There are two options which you can use to access the aws services :

1) boto3 : http://boto3.readthedocs.io/en/latest/index.html

boto3 provides you with a set of functions to control different aws services.

2) aws-cli : https://github.com/aws/aws-cli

This provides a command line client to access aws apis for different services.

You can use either of the above services for your task and have good documentation.

As far as emr is concerned, you can refer following specific documents:

http://boto3.readthedocs.io/en/latest/reference/services/emr.html

https://github.com/aws/aws-cli/tree/develop/awscli/examples/emr

Try out some these APIs and feel free to ask for help if you get stuck somewhere.

Harsh Bafna
  • 2,094
  • 1
  • 11
  • 21
  • These allow you to start instances, not necessarily submit code – OneCricketeer Jan 27 '18 at 16:00
  • You can submit your code as steps to EMR using these APIs. http://boto3.readthedocs.io/en/latest/reference/services/emr.html#EMR.Client.add_job_flow_steps https://github.com/aws/aws-cli/blob/develop/awscli/examples/emr/add-steps.rst – Harsh Bafna Jan 27 '18 at 16:02
  • Thank you for your answer. I am familiar with those sources. My question was about examples or tutorials on how to do this. An example that shows it end to end. – Fred R. Jan 28 '18 at 16:02
  • For boto -> you can try out run_job_flow, add_job_flow_step and terminate_job_flows functions will do your job for a start. Please read the documentation, they are very rich. You won't find many examples related to boto. – Harsh Bafna Jan 28 '18 at 16:08
  • For AWS cli... Check the git repository they have documented the examples as well. But you will have to execute your commands on shell through python where as boto3 will provide you python functions. – Harsh Bafna Jan 28 '18 at 16:10