1

I am seeing this details graph in the spark ui:

enter image description here

I have couple of questions regarding this graph:

1- Why Schedular delay and Task deserialization take so long compared to computing time? Does this mean something is wrong with job optimization (with my spark script)?

2- As I understand, each row corresponds to an executor (see 1/10.42.3.34, 2/10.42.4.160 etc. on the left column). Each executor has 3 cores but some rows contain 2 "colored bar"s while some contain 3 bars. Why is that? Is each colored bar for a specific core/task? Then why have 2 bars while having 3 cores? Does that mean 1 core did not work at all?

3- Colored bars in a row never start at the same position. What does that mean? Does this tell us that, although run in parallel, tasks do not start at the same time?

4- Same applies to the ending of the bars.

5- Why some bars have a yellow ending (indicating shuffle write time) while others do not?

6- Why some bars have purple ending (indicating result serialization time) while others do not?

7- Why some bars end with yellow followed by purple (both shuffle write and serialization)? What is happening there?

8- At the top of the graph it is saying 2 secs Across all Tasks. How is that calculated? When I look at the task durations I don't see a task that takes more than a couple of milliseconds.

enter image description here

I believe understanding this graph is quite important so any help is appreciated. Cheers.

honor
  • 7,378
  • 10
  • 48
  • 76

0 Answers0