9

Most of Hadoop MapReduce programs are like this:

public class MyApp extends Configured Implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        Job job = new Job(getConf());
        /* process command line options */
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new MyApp(), args);
        System.exit(exitCode);
    }
}

What is the usage of Configured? As Tool and Configured both have getConf() and setConf() in common. What does it provide to our application?

Sagar Zala
  • 4,854
  • 9
  • 34
  • 62
Majid Azimi
  • 5,575
  • 13
  • 64
  • 113

2 Answers2

11

Configured is an implementation class of the interface Configurable. Configured is the base class which has the implementations of getConf() and setConf().

Merely extending this base class enables the class that extends this to be configured using a Configuration and there are more than one implementations for Configuration.

When your code executes the following line,

ToolRunner.run(new MyApp(), args);

Internally it will do this

ToolRunner.run(tool.getConf(), tool, args);

In the above case tool is the MyApp class instance which is an implementation of Tool which just as you said has getConf() but it is just as an interface. The implementation is coming from Configured base class. If you avoid extending Configured class in the above code, then you will have to do the getConf() and setConf() implementations on your own.

shazin
  • 21,379
  • 3
  • 54
  • 71
3

Configured is a default implementation of the Configurable interface - basically its setConf method retains a private instance variable to the passed Configuration object and getConf() returns that reference

Tool is an extension of the Configurable interface, providing an addition run(..) method and is used with ToolRunner to parse out command line options (using the GenericOptionsParser) and build a Configuration object which is then passed to the setConf(..) method.

Your main class will typically extend Configured such that the Configurable interface methods required in Tool will be implemented for you.

In general you should be using the ToolRunner utility class to launch your MapReduce jobs as it handles the common task of parsing out command line arguments and building the Configuration object. I'd look at the API Docs for ToolRunner for more info.

Chris White
  • 29,949
  • 4
  • 71
  • 93