I want to run EMR spark job which will output data to S3, when job completed terminate cluster and submit custom jar which will import data to Redshift
I am deploying all my jar files to S3 folder
For EMR I am using Airflow EMR/Livy operators to submit job which can be easily configured to pick jars from S3
What should I use to submit custom jar from S3 in Airflow?
For now I use airflow SSH operator which does next:
- Copies jar file, if it does not exists, from s3 to temp folder
- Submits jar from cmd using java -cp command
Also, I don't like an idea of submitting jar directly on Airflow as it can overload it if jar requires much resources/time to run
I am wondering, is there a better way to do this?