- config
spark.yarn.maxAppAttempts = 2
(oryarn.resourcemanager.am.max-attempts=2
) - I do
df.cache()
in some stage and that stage is finished - Then the first attempt will fail for whatever reason (some GC memory failure for example)
Does the next attempt take advantage of already computed cached data or it is the completely new separrated computation?
Related,but not exact same: How to limit the number of retries on Spark job failure?