I get the following error when my spark jobs fails **"org.apache.spark.shuffle.FetchFailedException: The relative remote executor(Id: 21), which maintains the block data to fetch is dead."**
Over view of my spark job
input size is ~35 GB
I have broadcast joined all the smaller tables with the mother table into say a dataframe1
and then i salted each big table and dataframe1
before i join it with dataframe1
(left table).
profile used:
@configure(profile=[
'EXECUTOR_MEMORY_LARGE',
'NUM_EXECUTORS_32',
'DRIVER_MEMORY_LARGE',
'SHUFFLE_PARTITIONS_LARGE'
])
using the above approach and profiles i was able to get the runtime down by 50% but i still get Shuffle Stage Failing Due To Executor Loss issues.
is there a way i can fix this?