2

I have an 8-nodes Spark standalone cluster with a total of 880 GB RAM and 224 cores.

I just can't explain why the Shuffle Read Blocked Time is so long : about 20 minutes per task. Do you have an idea why ? What are the bottleneck in such case ?

To give more details, you can see bellow the details for stage :

details for stage apache Spark

The following tasks metrics from Spark Ui bellow :

Summary completed tasks metrics from Spark UI

The agregation per executor bellow :

Agregation per executor metrics from Spark UI

And the stage full DAG :

enter image description here

The executor tab :

enter image description here

The list of stages :

enter image description here

Thank you!

Klun
  • 78
  • 2
  • 25
  • https://stackoverflow.com/questions/45740567/spark-shuffle-read-takes-significant-time-for-small-data could be related – Vish Sep 14 '21 at 21:30
  • Related to storage ? Yes it could be. Could you elaborate a bit please ? In fact, the underlying storage is GPFS (IBM equivalent to HDFS). And we can't have IBM FPO enable (so to be clear : no local disk for spark.local.dir, but the GPFS distributed filesystem is also used for spark local dir which might come with some latencies..) – Klun Sep 14 '21 at 21:36
  • Related to GC ? Yes it could be too. I have 8 executors (one executor = one node), so each executor has about 100GB free memory. I use the default GC. Do you have some tips here to enhance my ETL ? When I look to the GUI, "executors" tab, the "GC" column doesn't appear in red however – Klun Sep 14 '21 at 21:39
  • yup ..i was only pointing toward latencies/throughput but link posted above also hints about previous state executors having gc causing this issue – Vish Sep 14 '21 at 21:40
  • can you please check gc time in previous stage? – Vish Sep 14 '21 at 21:41
  • I have added at the end of my message the "executor tab". GC seems OK.. – Klun Sep 14 '21 at 21:43
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237109/discussion-between-vish-and-klun). – Vish Sep 14 '21 at 21:45
  • It might not be related to your issue, but I find 120GB spill memory quite suspicious – BlueSheepToken Sep 14 '21 at 21:46
  • Could you please elaborate @BlueSheepToken ? :) – Klun Sep 14 '21 at 21:51
  • I am on my phone right now, I Can elaborate tomorrow. Meanwhile Can you try to increase the number of Shuffle partitions ? – BlueSheepToken Sep 14 '21 at 21:54
  • Yes I will try to increase number of shuffle partitions tomorrow. For now, it is set to 2000 to process my 17 billions of rows – Klun Sep 14 '21 at 21:55
  • Did you find a solution for this? I have the exact same problem. I disappears on restart of the cluster, but then the "Shuffle Read Blocked Time" gradually start increasing with each job. – vntzy Apr 27 '23 at 12:43

0 Answers0