Databricks: Job having high shuffle write and executing very long

Asked Aug 20 '19 at 03:58

Active Aug 20 '19 at 03:58

Viewed 128 times

I am having trouble in running a databricks notebook ( scala) , And I see the job is having high write shuffle size. and it already run over an hour. Let's have a look on the following screen enter image description here

Any idea on checking how why ?

shuffle write: 35.5GB/ 1796240509 what's the meaning of 35.5GB and 1796240509 ??

asked Aug 20 '19 at 03:58

mytabi

1796240509 is the number of records whereas Shuffle Write is the sum of all written serialized data on all executors before transmitting. https://stackoverflow.com/questions/27276884/what-is-shuffle-read-shuffle-write-in-apache-spark – anshul_cached Aug 20 '19 at 07:48
number of what records ??? – mytabi Aug 20 '19 at 08:53

Databricks: Job having high shuffle write and executing very long

0 Answers0