Shuffle Stage Failing Due To Executor Loss

Question

I get the following error when my spark jobs fails **"org.apache.spark.shuffle.FetchFailedException: The relative remote executor(Id: 21), which maintains the block data to fetch is dead."**

Over view of my spark job

input size is ~35 GB

I have broadcast joined all the smaller tables with the mother table into say a dataframe1 and then i salted each big table and dataframe1 before i join it with dataframe1 (left table).

profile used:

@configure(profile=[
     'EXECUTOR_MEMORY_LARGE',
     'NUM_EXECUTORS_32',
     'DRIVER_MEMORY_LARGE',
     'SHUFFLE_PARTITIONS_LARGE'
])

using the above approach and profiles i was able to get the runtime down by 50% but i still get Shuffle Stage Failing Due To Executor Loss issues.

is there a way i can fix this?

It looks like you are substantially increasing your compute cost. For your input size, this feels like it should be a problem with your code, maybe you're having some join explosion, execution heavy udfs, or other similarly complex code, normally refactoring the code into simpler steps is the answer here. — fmsf, Jan 26 '22 at 16:15
@fmsf i have salted all the bigger tables (rows > 1 million) and then join them, this was to avoid skew joins, all my tables are left joined to main df. Is there a way i can find where the data is exploding? — Arun Mohan, Jan 27 '22 at 08:15
It may be spark also failing to optimize your query, and ending up with a very big query plan. You I would recommend trying to break it into steps, maybe use checkpoints as well if you can. What you are describing is more of a spark question, not necessarily a foundry one. — fmsf, Jan 27 '22 at 10:26

score 2 · Accepted Answer · answered Feb 04 '22 at 14:48

There are multiple things you can try:

Broadcast Joins: If you have used broadcast hints to join multiple smaller tables, then the resulting table (of many smaller tables) might be too huge to be accommodated in each executor memory. So, you need to look at total size of dataframe1.
35GB is really not huge. Also try the profile "EXECUTOR_CORES_MEDIUM", which really increases the parallelism in data computation. Use Dynamic allocation (16 executors should be fine for 35GB) rather than static allocation. If 32 executors are not available at a time, the build doesn't start. "DRIVER_MEMORY_MEDIUM" should be enough.
Spark 3.0 handles skew joins by itself with Adaptive Query Execution. So, you need not use salting technique. There is a profile called "ADAPTIVE_ENABLED" with foundry that you can use. Other settings of adaptive query execution, you will have to set manually with "ctx" spark context object readily available with Foundry.

Some references for AQE: https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/aqe https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution

adding EXECUTOR_CORES_MEDIUM helped improve parallelism, thank you for this. — Arun Mohan, Mar 18 '22 at 10:17

Shuffle Stage Failing Due To Executor Loss

1 Answers1