Spark Job slowness

Question

Whenever I am running spark job with below parameters it is getting slow down.

spark-submit --conf spark.sql.shuffle.partitions=100 --master yarn --deploy-mode cluster --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=30 --num-executors 5 --executor-cores 5 --executor-memory 17g --conf Spark.Dynamic.executors=true

I have a script which is writing data to 14 tables and this job hardly takes 5 minutes to complete as it is incrementally running. It is getting stuck to one table which is almost taking 3 hr to complete and on someday it is getting complete within a seconds. Below is the dag of the job which is consuming time :

DAG of the job consuming time

DAG pf the job consuming time remaining part screenshot

DAG of the job getting complete within a 5 minutes

The interesting is to see the first screeshot shows a shuffle happening in one single partition of the data. Where it is reading 300K rows in one single partition. The other ones are not reading anything and is not running a shuffle. You might need to check your partitioning. — Thiago Baldim, Jul 15 '22 at 04:23
Yeah correct. But previous to that in last screenshot it got completed in 5 minutes with no load.....may be I am wrong but do we need to really work on partition side as the table for which it is getting stuck has no partitioned column and it is kudu table and the total number of records in the table is 2939544 rows — Ashish Rana, Jul 15 '22 at 05:28
I have improved the execution time by following the above point. Now I ma sometime getting the below error : java.lang.RuntimeException: Failed to write 32 rows to Kudu; Sample errors: Timed out: cannot complete before timeout — Ashish Rana, Jul 16 '22 at 04:01
Humm that might be related to Kudu configuration by itself. Probably nothing related to spark as far as I know. — Thiago Baldim, Jul 17 '22 at 23:18

Spark Job slowness

0 Answers0