3

I'm trying to use Hive on MR executing SQL and it fails half way with errors below:

Application application_1570514228864_0001 failed 2 times due to AM Container for appattempt_1570514228864_0001_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-10-08 13:57:49.272]Failed to download resource { { s3a://tpcds/tmp/hadoop-yarn/staging/root/.staging/job_1570514228864_0001/libjars, 1570514262820, FILE, null },pending,[(container_1570514228864_0001_02_000001)],1132444167207544,DOWNLOADING} java.io.IOException: Resource s3a://tpcds/tmp/hadoop-yarn/staging/root/.staging/job_1570514228864_0001/libjars changed on src filesystem (expected 1570514262820, was 1570514269265

The key message from the error log from my perspective is libjars changed on src filesystem (expected 1570514262820, was 1570514269265. There are several threads about this issue at SO but not been answered yet, like thread1 and thread2.

I found something valuable from apache jira and redhat bugzilla. I synced clock by NTP through all nodes related. But same issue is still there.

Any comment is welcomed, thx.

Eugene
  • 10,627
  • 5
  • 49
  • 67

3 Answers3

3

I still didn't know why the timestamp of resource file is inconsistent and there isn't a way to fix it in configuration way, AFAIK.

However, I managed to find a workaround to skip the issue. Let me summarize it here for anyone who might run into same issue.

By checking error log and search it at Hadoop source code, we can trace the issue at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java.

Just remove the exception throwing statements,

  private void verifyAndCopy(Path destination)
      throws IOException, YarnException {
    final Path sCopy;
    try {
      sCopy = resource.getResource().toPath();
    } catch (URISyntaxException e) {
      throw new IOException("Invalid resource", e);
    }
    FileSystem sourceFs = sCopy.getFileSystem(conf);
    FileStatus sStat = sourceFs.getFileStatus(sCopy);
    if (sStat.getModificationTime() != resource.getTimestamp()) {
            /**
      throw new IOException("Resource " + sCopy +
          " changed on src filesystem (expected " + resource.getTimestamp() +
          ", was " + sStat.getModificationTime());
          **/
            LOG.debug("[Gearon][Info] The timestamp is not consistent among resource files.\n" +
                            "Stop throwing exception . It doesn't affect other modules. ");
    }
    if (resource.getVisibility() == LocalResourceVisibility.PUBLIC) {
      if (!isPublic(sourceFs, sCopy, sStat, statCache)) {
        throw new IOException("Resource " + sCopy +
            " is not publicly accessible and as such cannot be part of the" +
            " public cache.");
      }
    }

    downloadAndUnpack(sCopy, destination);
  }

Build hadoop-yarn-project and copy 'hadoop-yarn-common-x.x.x.jarto$HADOOP_HOME/share/hadoop/yarn`.

Leave this thread here and thanks for any further explanation about how to fix it without changing hadoop source.

Eugene
  • 10,627
  • 5
  • 49
  • 67
  • same problem on osx 10.15.4, looks like `resource.getTimestamp()` drop miliseconds which `FileStatus` hold – toien Aug 09 '20 at 12:13
1

I had to do the same , this should be configurable, even small latency will fail the execution, this might happen, if one changes the hadoop file system to use s3 and run MR program , Note* please make sure, you are using same jdk version to generate the jar as mentioned in apache hadoop docs for your hadoop version, else you might run into errors.

Prabhat jha
  • 519
  • 4
  • 6
0

You can actually fix this just by manually setting the date back to the previous value. E.g. using touch command with -t STAMP or -d DATE option:

sudo touch -d '07 Apr 2022 11:12:30.000 +0000' '<path_to_file>'
Edoardo Basili
  • 109
  • 1
  • 9