I need to automate pyspark scripts to execute on an existing AWS EMR cluster for a client. The constraints are:
- No ssh access to the cluster's head node
- Can't create any EC2 instances
- Others in my group add their code to the Steps tab for the running cluster
- I have read/write access to S3
- The cluster remains in a running state; no need to script its stand-up or tear-down
- I have PyCharm pro
I reviewed this SO post, which is close to what I am after. Ideally, I would use Python with boto3 with PyCharm to pass the PySpark code fragment to their long-running cluster. What would others do here?