Hadoop jobcontrol

Question

I am trying to run multiple Map/Reduce tasks in Hadoop. After searching on google, I went with method 2 as described at http://cloudcelebrity.wordpress.com/2012/03/30/how-to-chain-multiple-mapreduce-jobs-in-hadoop/ : use JobControl. I got the following error:

/examples2/format/Dictionary.java:100: error: no suitable method found for addJob(org.apache.hadoop.mapreduce.Job)
jbcntrl.addJob(job);
       ^
method JobControl.addJob(org.apache.hadoop.mapred.jobcontrol.Job) is not applicable
      (actual argument org.apache.hadoop.mapreduce.Job cannot be converted to org.apache.hadoop.mapred.jobcontrol.Job by method invocation conversion)

As described at Is it better to use the mapred or the mapreduce package to create a Hadoop Job?, there are two different API's, which seem to misaligned here. After looking further, I found JobControl and JofConf.setMapperClass() error. They say that using the mapreduce package org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl' instead of 'org.apache.hadoop.mapred.jobcontrol.JobControl should solve it. Only problem is: I am using this. When I take a look at this particular file (hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/jobcontrol/JobControl.java in the sourcecode), I see it is using

import org.apache.hadoop.mapred.jobcontrol.Job;

instead of

import org.apache.hadoop.mapreduce.Job;

Which seems to me to be causing the error (correct?). Is there any way, other than reverting code back to mapred, to get around this? Or any other way of running multiple M/R jobs?

Update: I got method 1 from http://cloudcelebrity.wordpress.com/2012/03/30/how-to-chain-multiple-mapreduce-jobs-in-hadoop/ to work, but I am still interested in the answer for the problem.

score 0 · Answer 1 · answered Apr 28 '15 at 18:05

0

mapred is the older API set .

Please change to mapreduce for coding further MR programs .

mapreduce api is much more compact and encapsulated most of the things in context class , making life of coder simple

answered Apr 28 '15 at 18:05

KrazyGautam

2,839
2
21
31

score 0 · Answer 2 · answered Oct 24 '19 at 14:47

Quite some time has passed since you where asking the question, but you are adding the wrong object to the JobControl. You need to wrap the Job with a class called ControlledJob and only then you could add it to JobControl. Here is a small example:

Job jobWordCount = Job.getInstance [...]
[setup jobWordCount]
Job jobSort = Job.getInstance [...]
[setup jobSort]

JobControl jobControl = new JobControl("word-count-control") {{
    ControlledJob count = new ControlledJob(jobWordCount, null);
    ControlledJob sort = new ControlledJob(jobSort,  Arrays.asList(count));
    addJob(count);
    addJob(sort);
}};

Here is an example you may also look at.

score -1 · Answer 3 · answered Apr 28 '15 at 06:37

-1

Oozie is a system for describing the workflow of a job, where that job may contain a set of map reduce jobs, pig scripts, file system operations etc and supports fork and joining of the data flow.

The oozie documentation has an example with multiple MR jobs, including a fork:

http://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#Appendix_B_Workflow_Examples

answered Apr 28 '15 at 06:37

madhu

1,140
8
14

well this question is related to mapreduce jobs and therefor you may consult the [JobControl Documentation](http://hadoop.apache.org/docs/r2.8.3/api/org/apache/hadoop/mapreduce/lib/jobcontrol/JobControl.html) of yarn/hadoop implementation first – edi Oct 24 '19 at 14:49

Hadoop jobcontrol

3 Answers3