I am trying to do a forecast of sales using Prophet in my Databricks cluster through a Grouped Map Pandas UDF. The problem is that each time I run it, either two or one executors get stuck running their last task set (that'd be 8 tasks per executor (8 total) for each of their 4 cores), while the rest run normally, checking off 8 at a time.
Aggregated Metrics per executor
Here is the error message if see in the stderr of each hanging executor [enter image description here][2]
Also, before executing the pandas Udf, I repartitioned my dataset on 64 partitions.
I'm running spark job and noticed that after few stages completion, tasks were idle for sometime and again started.
Spark version - 3.2.1 Databricks Runtime 10.4 LTS
Total executors - 8
Total cores - 32 (4 for each executor)
Total memory - 64GB (8 for each)
Is there any reason why tasks stays in the running state while hanging?
if yes, what could be the reason.
Thank you.