Spark 2 - does the second(third...) attempt reuse already cashed data or it starts everything from beginning?

Asked Sep 11 '18 at 08:27

Active Sep 11 '18 at 08:27

Viewed 269 times

config spark.yarn.maxAppAttempts = 2 (or yarn.resourcemanager.am.max-attempts=2)
I do df.cache() in some stage and that stage is finished
Then the first attempt will fail for whatever reason (some GC memory failure for example)

Does the next attempt take advantage of already computed cached data or it is the completely new separrated computation?

asked Sep 11 '18 at 08:27

Babu

@ernest_k Doesn't look like it. The linked question asks about node failure, `spark.yarn.maxAppAttempts` relates to application failure. – zero323 Sep 11 '18 at 09:37
@user6910411 I believe the same answers apply here. – ernest_k Sep 11 '18 at 09:38
@ernest_k It doesn't. Cache files are not preserved after application finished (or failed) and are not shared between apps. I think that https://stackoverflow.com/q/41721963 is better suited here. – zero323 Sep 11 '18 at 09:40

0 Answers0