2

I am trying to do a forecast of sales using Prophet in my Databricks cluster through a Grouped Map Pandas UDF. The problem is that each time I run it, either two or one executors get stuck running their last task set (that'd be 8 tasks per executor (8 total) for each of their 4 cores), while the rest run normally, checking off 8 at a time.

Aggregated Metrics per executor

Here is the error message if see in the stderr of each hanging executor [enter image description here][2]

Also, before executing the pandas Udf, I repartitioned my dataset on 64 partitions.

I'm running spark job and noticed that after few stages completion, tasks were idle for sometime and again started.

Spark version - 3.2.1 Databricks Runtime 10.4 LTS

Total executors - 8

Total cores - 32 (4 for each executor)

Total memory - 64GB (8 for each)

Is there any reason why tasks stays in the running state while hanging?

if yes, what could be the reason.

Thank you.

stderr log of a hanging executor

0 Answers0