Spark jobs failing because HDFS is caching jars

Question

I upload Scala / Spark jars to HDFS to test them on our cluster. After running, I frequently realize there are changes that need to be made. So I make the changes locally then push the new jar back up to HDFS. However, often (not always) when I do this, hadoop throws an error essentially saying that this jar is not the same as the old jar (duh).

I try clearing my Trash, .staging, and .sparkstaging directories but that doesn't do anything. I try renaming the jar, which will work sometimes and other times it won't (it's still ridiculous I have to do this in the first place).

Does anyone know why this is occurring and how I can prevent it from occurring? Thanks for any help. Here are some logs if that helps (edited out some paths):

Application application_1475165877428_124781 failed 2 times due to AM Container for appattempt_1475165877428_124781_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://examplelogsite/ Then, click on links to logs of each attempt. Diagnostics: Resource MYJARPATH/EXAMPLE.jar changed on src filesystem (expected 1475433291946, was 1475433292850 java.io.IOException: Resource MYJARPATH/EXAMPLE.jar changed on src filesystem (expected 1475433291946, was 1475433292850 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application.

score 0 · Answer 1 · answered Oct 02 '16 at 18:49

0

I haven't seen that exit code before, so to me, it doesn't say anything, I would suggest you to check the logs, like this:

yarn logs -applicationId <your_application_ID>

answered Oct 02 '16 at 18:49

gsamaras

71,951
46
188
305

this is the weird thing. I'm running this via an Oozie workflow and neither the oozie job nor the spark job have any logs in typical place. I'm just getting the above log through Hue – David Schuler Oct 02 '16 at 18:56

score 0 · Answer 2 · answered Oct 16 '19 at 03:56

According to your log, I'm sure it comes from yarn side.
You can modify yarn yourself to skip this exception as workaround.
I ran into this thread cause the error log changed on src filesystem, I met this issue and skipped it by modify yarn src code.
For more details, you can refer to how-to-fix-resource-changed-on-src-filesystem-issue

Spark jobs failing because HDFS is caching jars

2 Answers2

Linked