19

I have setup a 2 node cluster of Hadoop 2.3.0. Its working fine and I can successfully run distributedshell-2.2.0.jar example. But when I try to run any mapreduce job I get error. I have setup MapRed.xml and other configs for running MapReduce job according to (http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide) but I am getting following error :

14/03/22 20:31:17 INFO mapreduce.Job: Job job_1395502230567_0001 failed with state FAILED due to: Application application_1395502230567_0001 failed 2 times due to AM Container for appattempt_1395502230567_0001_000002 exited 
with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
    org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)


    Container exited with a non-zero exit code 1
    .Failing this attempt.. Failing the application.
    14/03/22 20:31:17 INFO mapreduce.Job: Counters: 0
    Job ended: Sat Mar 22 20:31:17 PKT 2014
    The job took 6 seconds.

And if look at stderr (log of job) there is only one line "Could not find or load main class 614"

Now I have googled it and usually this issues comes when you have different JAVA versions or in yarn-site.xml classpath is not properly set , my yarn-site.xml has this

  <property>
    <name>yarn.application.classpath</name>
    <value>/opt/yarn/hadoop-2.3.0/etc/hadoop,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*,/opt/yarn/hadoop-2.3.0/*,/opt/yarn/hadoop-2.3.0/lib/*</value>
  </property>

So any other ideas what could be the issue here ?

I am running my mapreduce job like this:

$HADOOP_PREFIX/bin/hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter out
Loom
  • 9,768
  • 22
  • 60
  • 112
TonyMull
  • 271
  • 1
  • 4
  • 12
  • I have tried hadoop 2.2.0 and 2.3.0 but same error ! – TonyMull Mar 22 '14 at 16:00
  • see this link http://stackoverflow.com/questions/20390217/mapreduce-job-in-headless-environment-fails-n-times-due-to-am-container-exceptio/39383907#39383907 – aibotnet Sep 08 '16 at 06:28

11 Answers11

7

I encountered the same problem when trying to install Hortonworks HDP 2.1 manually. I managed to capture the container launcher script which contained the following:

#!/bin/bash

export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/data/1/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001,/data/2/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001,/data/3/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001,/data/4/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001"
export JAVA_HOME="/usr/java/latest"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
export HADOOP_TOKEN_FILE_LOCATION="/data/2/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001/container_1406927878786_0001_01_000001/container_tokens"
export NM_HOST="test02.admin.hypertable.com"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1406927878786_0001"
export JVM_PID="$$"
export USER="doug"
export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
export PWD="/data/2/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001/container_1406927878786_0001_01_000001"
export CONTAINER_ID="container_1406927878786_0001_01_000001"
export HOME="/home/"
export NM_PORT="62404"
export LOGNAME="doug"
export APP_SUBMIT_TIME_ENV="1406928095871"
export MAX_APP_ATTEMPTS="2"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export MALLOC_ARENA_MAX="4"
export LOG_DIRS="/data/1/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001,/data/2/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001,/data/3/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001,/data/4/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001"
ln -sf "/data/1/hadoop/yarn/local/usercache/doug/filecache/10/libthrift-0.9.2.jar" "libthrift-0.9.2.jar"
ln -sf "/data/4/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001/filecache/13/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf "/data/3/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001/filecache/12/job.split" "jobSubmitDir/job.split"
mkdir -p jobSubmitDir
ln -sf "/data/2/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001/filecache/11/job.splitmetainfo" "jobSubmitDir/job.splitmetainfo"
ln -sf "/data/1/hadoop/yarn/local/usercache/doug/appcache/application_1406927878786_0001/filecache/10/job.jar" "job.jar"
ln -sf "/data/2/hadoop/yarn/local/usercache/doug/filecache/11/hypertable-0.9.8.0-apache2.jar" "hypertable-0.9.8.0-apache2.jar"
exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data/4/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/data/4/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001/stdout 2>/data/4/hadoop/yarn/logs/application_1406927878786_0001/container_1406927878786_0001_01_000001/stderr "

The line that sets CLASSPATH was the culprit. To resolve the problem I had to set the variables HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_YARN_HOME, and HADOOP_MAPRED_HOME in hadoop-env.sh to point to the appropriate directories under /usr/lib. In each of those directories I also had to setup the share/hadoop/... subdirectory hierarchy where the jars could be found.

Doug Judd
  • 362
  • 3
  • 7
  • 1
    How did you capture the start script? I'm trying to debug a yarn MR issue where the job fails but there is literally nothing in the logs. – Mark Aug 06 '14 at 23:42
  • 3
    @Mark You may want to set the following in yarn-site.xml like so: ` yarn.nodemanager.delete.debug-delay-sec 600 ` From the [docs](http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml): Number of seconds after an application finishes before the nodemanager's DeletionService will delete the application's localized file directory and log directory. To diagnose Yarn application problems, set this property's value large enough (for example, to 600 = 10 minutes) to permit examination of these directories. – adino Sep 08 '14 at 18:22
  • @adino advice was essential to retain logs, and for people as confused as I was, the localized app log files stdout, stderr and syslog (the really helpful info was in syslog!) were on the linux file system of a node in the cluster, directory ~hadoop/hadoop-2.6.2/logs/userlogs/application_1450815437271_0004/container_1450815437271_0004_01_000001/ – chrisinmtown Dec 23 '15 at 13:34
1

I solved this question with the following:

Because in my hadoop/etc/hadoop[hadoop-2.7.3 configuration catalog]:About mapred-site.xml:

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
 </property>
 <property>
   <name>mapreduce.jobhistory.address</name>
   <value>zhangjunj:10020</value>
 </property>
 <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>zhangjunj:19888</value>
 </property>
</configuration>

In this file. The 'zhangjunj' must be your master's machine name, but I had written 'hadoop' in the beginning.

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
俊杰张
  • 11
  • 2
0

I fixed the issue, it was due to incorrect paths. By giving full dir path to mapred , hdfs , yarn & common solves the problem.

Thanks, Tony

TonyMull
  • 271
  • 1
  • 4
  • 12
0

Please check the property. Ensure all required jars are present.

**yarn.application.classpath** /etc/hadoop/conf,/usr/lib/hadoop/,/usr/lib/hadoop/lib/,/usr/lib/hadoop-hdfs/,/usr/lib/hadoop-hdfs/lib/,/usr/lib/hadoop-yarn/,/usr/lib/hadoop-yarn/lib/,/usr/lib/hadoop-mapreduce/,/usr/lib/hadoop-mapreduce/lib/

akshat thakar
  • 1,445
  • 21
  • 29
0

Please check the logs first (they will be in user directory under logs directory of Hadoop).

Also check the permissions of all directories you mentioned in yarn, hdfs, core-site XML files. Because this error is caused by wrong permission issues in most cases.

Harit Singh
  • 91
  • 1
  • 3
0

Maybe you can run HistoryServer with following code under $HADOOP_HOME/bin,

./mr-jobhistory-daemon.sh start historyserver

And then you can control logs of Hadoop Error from this url, (History Log)

http://<Resource Manager Host name adress>:8088/cluster

And Most probably You get Class Not Found Exception

iceberg
  • 1,951
  • 1
  • 22
  • 26
0

I also encountered this issue on Ambari 2.0 + HDP2.3 + HUE3.9 my fix experiece is: 1. make sure spark client exist on all hadoop yarn node 2. export SPARK_HOME on all yarn node (spark client), and hue host

0

The permissions should be 6050 owner:root group hadoop

---Sr-s--- 1 root hadoop /usr/lib/hadoop-yarn/bin/container-executor

Nimmagadda
  • 39
  • 2
0

Check Swap size in your system: free -m If there is Swap: 0 0 0 allocate Swap memory following these instructions

Igorock
  • 2,691
  • 6
  • 28
  • 39
0

In my case the problem was due to insufficient memory. I inserted the below into yarn-site-xml as adino suggested in his comment above:

<property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value>600</value> </property>

After that I could see an error in the stderr log file. I don't remember the exact wording (logfile got deleted after a while). It was along the lines of "out of memory error"

I edited my virtual machine to add another swap partition of the size 3 Gigabytes (probably total overkill). I did this with Gparted.

Afterwards I had to register the new swap partition by typing

mkswap /dev/sda6 (/dev/sda6 is the partition name)
swapon /dev/sda6 

I found the uid of the new swap partition by typing "blkid" and copying the uid.

I registered the swap into the file fstab:

sudo vi /etc/fstab

I added a new line for the new swap partition. I copied the whole line from the previous swap partition and just changed the UID.

UUID=2d29cddd-e721-4a7b-95c0-7ce52734d8a3 none  swap    sw      0       0

After this, the error disappeared. I'm sure there's more elegant ways to solve this, but this worked for me. I'm pretty new to dealing with Linux.

Laura
  • 191
  • 1
  • 12
-1

You will need to delay log removal by setting yarn.nodemanager.delete.debug-delay-sec to 600.

This will allow you to browse the stderr, stdout and syslog in /hadoop/yarn/log in the relevant container directory.

Most likely, you will find the error in syslog. And, most likely, it will be a ClassNotFoundException for class tez.history.logging.service.class=org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService.

If that is the case, then refer to the following ticket:

https://issues.apache.org/jira/browse/AMBARI-15041

Hatchet
  • 5,320
  • 1
  • 30
  • 42