0

Im using AWS Glue job run auto scaling for the number of workers.

After analysing a few metrics of job glue run, Ive figured out that the job is using the MaxNumberOfWorkers always, also havig the auto scale active.

Is any way for optmize the job?

I want to have a good performance and good savings by auto scalling. Thank you.

Klaus
  • 1

1 Answers1

0

Glue is doing a good job splitting your job into parallel tasks and running each task on a fraction of the data.

Glue service is charging DPU/h so most of the time running your job on 1 DPU or 10 DPUs is costing you the same, the later will be faster of course.

A rule of thumb is to increase MaxNumberOfWorkers progressively. As long as the execution time is decreasing linearly you are on a good path.

You can look at the job metrics to figure out the optimal value for MaxNumberOfWorkers checking the maximum number of executors. Have a look at this page for more information.

Some jobs will not scale, for instance if you are doing complex joins Glue cannot run efficiently subtasks in parallel. In this particular case you will have to review and optimise your query to make a better use of the technology.

MarcC
  • 413
  • 3
  • 12