1

I have an AWS glue job with Spark UI enabled by following this instruction: Enabling the Spark UI for Jobs

The glue job has s3:* access to arn:aws:s3:::my-spark-event-bucket/* resource. But for some reason, when I run the glue job (and it successfully finished within 40-50 seconds and successfully generated the output parquet files), it doesn't generate any spark event logs to the destination s3 path. I wonder what could have gone wrong and if there is any systematic way for me to pinpoint the root cause.

  • double-check if the `Amazon S3 prefix for Spark event logs` has the expected S3 path as you check at the end of a running. It also worth you check if the event log has been created on the `S3 path where the script is stored` or not. – Amir Maleki Feb 02 '21 at 05:17
  • I have the same problem. During running I can see an a file being created in `/tmp/spark-event-logs/` called `spark-application-1612277620995.inprogress` but at the end the logs are not visible in the specified bucket. I've tried giving more permissions to the glue IAM role but that doesn't help. I've tried it in different accounts too but it doesn't work. – Machiel Feb 02 '21 at 15:18

1 Answers1

1

How long is your Glue job running for?

I found that jobs with short execution times, less then or around 1 min do not reliably produce Spark UI logs in S3.

The AWS documentation states "Every 30 seconds, AWS Glue flushes the Spark event logs to the Amazon S3 path that you specify." the reason short jobs do not produce Spark UI logs probably has something to do with this.

If you have a job with a short execution time try adding additional steps to the job or even a pause/wait to length the execution time. This should help ensure that the Spark UI logs are sent to S3.