5

I have an Oozie workflow, with one of its step being a java step, running a jar stored on the local filesystem (the jar is present on all nodes).

Initially, the jar was installed via a RPM, so they all have the same timestamp.

While experimenting, I manually copied a new version over this jar, and I now get the message:

org.apache.oozie.action.ActionExecutorException: JA009: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1516602562532_15451 to YARN : Application application_1516602562532_15451 failed 2 times due to AM Container for appattempt_1516602562532_15451_000002 exited with  exitCode: -1000
For more detailed output, check the application tracking page: http://ip-10-0-0-239.eu-west-1.compute.internal:8088/cluster/app/application_1516602562532_15451 Then click on links to logs of each attempt.
Diagnostics: java.io.IOException: Resource file:/opt/tst/tst.jar changed on src filesystem (expected 1516886392000, was 1516891496000
Failing this attempt. Failing the application.

The main line is:

Resource file:/opt/tst/tst.jar changed on src filesystem (expected 1516886392000, was 151689149600).

The 2 numbers are timestamp, the expected one is indeed the TS of the old jar, identical on all servers, the was TS is the timestamp of the new jar on one of the datanodes (as they were scp'ed in a loop, the TS are slightly different).

My question is: how do I tell yarn to stop whinging and use the new one already?

A few notes:

  • Hortonworks 2.6, based on hadoop 2.7,
  • the jar is only put by me on the local FS, not in hdfs,
  • nothing to do with spark (my issue comes up a lot on google related to spark),
  • yarn.sharedcache.enabled is false (the default) so yarn scmadmin -runCleanerTask is not relevant here,
  • I could fix my current problem by just reusing the old jar and I could make sure that all DNs have the same TS, but I wonder how I would ever be able to use a new version (note that the jar pointed by oozie is a symlink to not have to update oozie when a new version is released),
  • I'd rather keep the file on the local FS instead of having to put it on hdfs,
  • the jar name is quite specific, it does no clash with any other jar,
  • the workflow runs as user yarn and I can't find any copy of my jar in the yarn user directory on hdfs (nor under oozie dir for that matter),
  • I can find copies of the jar under the yarn local dir /filecache, but their md5 do not match any of my (current) versions.
Guillaume
  • 2,325
  • 2
  • 22
  • 40
  • What specifically do you have against putting the jar on HDFS? Seems less overhead than looping an SCP job – OneCricketeer Jan 30 '18 at 06:21
  • @cricket_007 Our current deployment pipeline works with rpm, where it is basically free of work for me to install a jar on all servers. Having to update it to put the jar in hdfs is doable of course but I would rather not do the extra work, my todo list is big enough as it is. Furthermore, I really would like to understand the core of the problem. – Guillaume Jan 30 '18 at 06:48
  • 1
    @Guillaume I'm coming across the same kind of issue. Did you figure it out, in the end? – Poorkenny Apr 13 '18 at 08:54
  • @Poorkenny The master plan ended up being 'just wait a bit'. In my case the jar is tiny and not updated often, so pragmatically this is the best for us. Sorry, this is probably not the answer you hoped for. – Guillaume Apr 13 '18 at 09:37
  • @Guillaume Indeed, it's not ^^ Thanks for answering, though. – Poorkenny Apr 13 '18 at 14:29

2 Answers2

0

Here is my two cents, you can build yarn related jar yourself and add it to your current working environment.

It's could be a workaround to skip this "annoying" condition checking.

Generally steps are as below:

1, Get the source code of yarn you used. You can download it from Hadoop official site. 2, Search error log like changed on file system in Hadoop source code. 3, Comment it out 4, Rebuild Yarn related jar 5, Put it to your working environment.

You can refer to How to fix resource changed on src filesystem issue for more details.

Eugene
  • 10,627
  • 5
  • 49
  • 67
0

I encountered the same error, in my case with an output folder. (Resource path/to/output/folder changed on src filesystem (expected 1583243472154, was 1583243577395)) when running a Pig script in a Oozie workflow.

Removing the .staging folder fixed my problem.