hadoop only launch local job by default why?

Question

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?

./hadoop -jar MyRandomForest_oob_distance.jar  hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
    at Data.Data.loadData(Data.java:103)
    at MapReduce.DearMapper.loadData(DearMapper.java:261)
    at MapReduce.DearMapper.setup(DearMapper.java:332)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient:  map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs

score 10 · Accepted Answer · edited Feb 11 '14 at 19:36

LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:

<configuration>
<property>  
 <name>mapreduce.framework.name</name>  
 <value>yarn</value>  
 </property>
</configuration>

See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)

What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.

Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:

hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1

I finally find the reason is because I use -jar instead of jar while running the program ..:( — user974270, Apr 24 '12 at 09:27
@user974270 But why it will launch a local job when using -jar. Do you know the reason? — thomaslee, Jul 23 '13 at 09:51
I am facing the same problem but in psuedo distributed mode.. http://stackoverflow.com/questions/32787996/hadoop-help-required-to-understand-the-processing-steps?noredirect=1#comment53418582_32787996. Please help. — Ajay, Sep 26 '15 at 10:58
What is the interaction between these two settings: `mapred.job.tracker` and `mapreduce.framework.name`? Does one override the other? Does one apply for mr1 and the other for mr2? What happens when you submit a mr1 job in a yarn environment? — JMess, Jun 19 '17 at 17:34

score 4 · Answer 2 · answered Nov 10 '13 at 04:44

If you're using Hadoop 2 and your job is running locally instead of on the cluster, ensure that you have setup mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn. You also need to set up an aux-service in yarn-site.xml

Checkout the Cloudera Hadoop 2 operator migration blog for more information.

score 2 · Answer 3 · answered Mar 14 '13 at 12:51

I had the same problem that every mapreduce v2 (mrv2) or yarn task only ran with the mapred.LocalJobRunner

INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0

The Resourcemanager and Nodemanagers were accessible and the mapreduce.framework.name was set to yarn.

Setting the HADOOP_MAPRED_HOME before executing the job fixed the problem for me.

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

cheers dan

hadoop only launch local job by default why?

3 Answers3

Linked