3

I want to run EMR spark job which will output data to S3, when job completed terminate cluster and submit custom jar which will import data to Redshift

I am deploying all my jar files to S3 folder

For EMR I am using Airflow EMR/Livy operators to submit job which can be easily configured to pick jars from S3

What should I use to submit custom jar from S3 in Airflow?

For now I use airflow SSH operator which does next:

  1. Copies jar file, if it does not exists, from s3 to temp folder
  2. Submits jar from cmd using java -cp command

Also, I don't like an idea of submitting jar directly on Airflow as it can overload it if jar requires much resources/time to run

I am wondering, is there a better way to do this?

Grish
  • 93
  • 6

0 Answers0