I am running a spark application where data comes in every 1 minute. No of repartitions i am doing is 48. It is running on 12 executor with 4G as executor memory and executor-cores=4.
Below are the streaming batches processing time
Here we can see that some of the batches are taking around 20 sec but some are taking around 45 sec
I further drilled down in one of the batch which is taking less time. Below is the image.
and the one which is taking more time.
Here we can see more time is taken in repartitioning task, but above one was not taking much time in repartitioning. Its happening with every 3-4 batch. The data is coming from kafka Stream and has only value, no key.
Is there any reason related to spark configuration?