spark tasks not starting to execute

Question

i am running a job in spark shell job

--num-executors 15 
--driver-memory 15G 
--executor-memory 7G 
--executor-cores 8 
--conf spark.yarn.executor.memoryOverhead=2G 
--conf spark.sql.shuffle.partitions=500 
--conf spark.sql.autoBroadcastJoinThreshold=-1 
--conf spark.executor.memoryOverhead=800

the job is stuck and not starting the code is doing a cross join with filter conditions on a large dataset of 270m. i have increased partitions to 16000 for the large table 270m and the small table (100000), i have converted it to a broadcast variable

i have added the spark ui for the job ,

so i do have to reduce the partitions , increase the executors, any idea

thanks for helping out .

![spark ui 1][1] ![spark ui 2][2] ![spark ui 3][3] after 10 hours

status: tasks : 7341/16936 (16624 failed)

check the container error logs

RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

[50per completed ui 1 ][4][50per completed ui 2][5] [1]: https://i.stack.imgur.com/nqcys.png [2]: https://i.stack.imgur.com/S2vwL.png [3]: https://i.stack.imgur.com/81FUn.png [4]: https://i.stack.imgur.com/h5MTa.png [5]: https://i.stack.imgur.com/yDfKF.png

I would try to decrease your memory and CPU requirements. It looks like Spark is waiting for having the resources you are asking for. — Emiliano Martinez, Oct 14 '20 at 17:55
ok , but it works with same setting when i reduce the data size from 270mil *100,000 to 270mil*1000 and how much should i decrease the memory to . thx — prajwal rao, Oct 14 '20 at 18:10
I do not think it's waiting for resources since it has already processed 5 jobs. Since you are saying increasing the broadcast size is giving problem : Reduce the no of executors and increase executor memory — Sanket9394, Oct 14 '20 at 19:36

score 0 · Answer 1 · answered Oct 14 '20 at 19:43

If you can mention your cluster configurations, then it would be helpful.

But since you added Broadcast of small table of 1000 is working, but 100,000 is not probably you need to adjust your memory config.

As per your config i am assuming you have total : 15 * 7 = 105GB of memory.

You can try with --num-executors 7 --executor-memory 15

This will give more memory to each executor to hold a broadcast variable. Please adjust --executor-cores accordingly for proper utilization

spark tasks not starting to execute

1 Answers1