-2

I see from my job overview page that my job appears stuck on one of the stages (most others have taken a reasonable amount of time, one of them is much slower).

What does it mean when one of my stages is taking so long to finish?

vanhooser
  • 1,497
  • 3
  • 19

1 Answers1

0

The most likely thing you're suffering from is skew.

Skew is defined as an imbalance of work done by a Spark stage, namely that certain tasks for whatever reason take much longer to compute than others.

It's important to verify that your job actually has skew and not just assume this is the culprit.

One of the most common reasons for skew is an imbalance of distribution of keys for a shuffle. An example of this is when a join has a large count of rows for keys on both side of a join. There's some ways you can verify this distribution problem.

You might get unlucky sometimes and have a task that is both longer-running and kicked off at the very end of your stage. When this happens, you'll observe particularly slow stage execution times; sometimes you get lucky and it gets kicked off first. In this example, the slower 5 sec task is the skewed task. Unlucky Task

vanhooser
  • 1,497
  • 3
  • 19