2

I get an intermittent FileNotFoundException error when I run a query in Hive using the Tez engine.

ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1508808910527_45616_1_00, diagnostics=[Task failed, taskId=task_1508808910527_45616_1_00_000066, diagnostics=[TaskAttempt 0 failed, info=[Container container_e09_1508808910527_45616_01_000033 finished with diagnostics set to [Container failed, exitCode=-1000. File does not exist: hdfs://server02.corp.company.com:8020/tmp/hive/username/_tez_session_dir/b65ddde9-110e-47fc-ae1c-33a1f754f839/nzcodec.jar
java.io.FileNotFoundException: File does not exist: hdfs://server02.corp.company.com:8020/tmp/hive/username/_tez_session_dir/b65ddde9-110e-47fc-ae1c-33a1f754f839/nzcodec.jar
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

The query selects data from a staging table, repartitions it and writes it to a reporting table.

INSERT OVERWRITE TABLE ${reporting_table} PARTITION (day, app_name) select <all the fields> from ${staging_table} where day = '${day}'

The staged data is stored in Avro files is 350GB

hadoop fs -du -h -s /staged-data/2017-11-02
350.7 G /staged-data/2017-11-02

I've run the same query on the same set of data multiple times and the failure is intermittent.

My yarn settings look like this:

yarn.nodemanager.resource.memory-mb     83968
yarn.scheduler.minimum-allocation-mb    2048

My Tez settings on the query look like this:

SET hive.execution.engine=tez;
SET tez.am.resource.memory.mb=2048;
SET hive.tez.container.size=2048;

SET hive.merge.tezfiles=true;
SET hive.merge.smallfiles.avgsize=128000000;
SET hive.merge.size.per.task=128000000;

I've worked through the suggestions on https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html but I'm still seeing this problem. Adjusting the container size doesn't seem to help.

Is there another set of settings I can modify to prevent this?

s d
  • 2,666
  • 4
  • 26
  • 42

0 Answers0