1

In my Hadoop reducers, I need to know how many successful map tasks were executed in the current job. I've come up with the following, which as far as I can tell does NOT work.

    Counter totalMapsCounter = 
        context.getCounter(JobInProgress.Counter.TOTAL_LAUNCHED_MAPS);
    Counter failedMapsCounter = 
        context.getCounter(JobInProgress.Counter.NUM_FAILED_MAPS);
    long nSuccessfulMaps = totalMapsCounter.getValue() - 
                           failedMapsCounter.getValue();

Alternatively, if there's a good way that I could retrieve (again, from within my reducers) the total number of input splits (not number of files, and not splits for one file, but total splits for the job), that would probably also work. (Assuming my job completes normally, that should be the same number, right?)

Mark
  • 1,788
  • 1
  • 22
  • 21
  • The more I think about this, the more I think my problem is actually due to the scope of counters. I can increment and read a counter just fine within a single mapper or reducer, but what I need/want is a way to read a globally aggregated counter value (computed in my mappers and used in my reducers). – Mark Nov 04 '11 at 15:08

2 Answers2

2

Edit: Looks like it's not a good practice to retrieve the counters in the map and reduce tasks using Job or JobConf. Here is an alternate approach for passing the summary details from the mapper to the reducer. This approach requires some effort to code, but is doable. It would have been nice if the feature had been part of Hadoop and not required to hand code it. I have requested to put this feature into Hadoop and waiting for the response.


JobCounter.TOTAL_LAUNCHED_MAPS was retrieved using the below code in the Reducer class with the old MR API.

private String jobID;
private long launchedMaps;

public void configure(JobConf jobConf) {

    try {
        jobID = jobConf.get("mapred.job.id");

        JobClient jobClient = new JobClient(jobConf);

        RunningJob job = jobClient.getJob(JobID.forName(jobID));

        if (job == null) {
            System.out.println("No job with ID found " + jobID);
        } else {
            Counters counters = job.getCounters();
            launchedMaps = counters.getCounter(JobCounter.TOTAL_LAUNCHED_MAPS);
        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}

With the new API, Reducer implementations can access the Configuration for the job via the JobContext#getConfiguration(). The above code can be implemented in Reducer#setup().

Reducer#configure() in the old MR API and Reducer#setup() in the new MR API, are invoked once for each reduce task before the Reducer.reduce() is invoked.

BTW, the counters can be got from other JVM also beside the one which kicked the job.

JobInProgress is defined as below, so it should not be used. This API is for limited projects only and the interface may change.

@InterfaceAudience.LimitedPrivate({"MapReduce"})
@InterfaceStability.Unstable

Not that, JobCounter.TOTAL_LAUNCHED_MAPS also includes map tasks launched due to speculative execution also

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • 1
    I believe you've only outlined my proposed solution which does not work from within the reducer. It will work if you access the counters from the Job in the JVM that kicks off the job, but don't think this will work from within the reducer (I've already tried it). – Mark Nov 09 '11 at 00:20
  • Changed the response with a different approach. Later, found a similar SO [query](http://goo.gl/q7R2y) with a similar solution. – Praveen Sripati Nov 09 '11 at 13:28
  • @Thomas - what do you mean by version dependent? – Praveen Sripati Nov 10 '11 at 09:35
  • @PraveenSripati Interesting approach, but don't think it's doable in the new API (which is what I'm using). Can only instantiate a JobClient with a JobConf, but what you get from Context#getConfiguration() is of type Configuration, not JobConf. – Mark Nov 10 '11 at 15:28
  • @Mark - o.a.h.mapred.JobConf extends o.a.h.conf.Configuration. So, using a simple typecast `JobConf jobConf = (JobConf) context.getConfiguration();` fixed the problem. Was able to get the Counters in the new API also. BTW, there is not much much [difference](http://goo.gl/qGH4J) between the old and new API. – Praveen Sripati Nov 10 '11 at 17:32
  • Well, I really don't like this solution. It smells bad for a number of reasons: I had to downcast twice and use 3 deprecated classes. This screams brittle and is unlikely to work in future releases. This is also clearly mixing the old API and the new which is unlikely to have been well tested and is also unlikely to continue to work. Having said that, this solution DOES work (at least for version 0.20.2-cdh3u1), and I have not come up with--or been offered--anything better. I blame the Hadoop library for not enabling a more robust solution. I hope they will in future releases. Thanks. – Mark Nov 14 '11 at 13:44
  • @Mark - I agree - it's a hack to the job done - I agree that it's not the best solution. Open a JIRA as a feature request. With the old API it works fine. – Praveen Sripati Nov 14 '11 at 14:11
  • @Mark - http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCAFE9998.2FEF6%25evans@yahoo-inc.com%3E - check this approach of using a special key/value pair to send summary data from mapper to reducers. It is not a straight forward way, but it is doable. – Praveen Sripati Dec 03 '11 at 07:32
  • @PraveenSripati Interesting... it sounds like a better hack, but it also sounds like a lot of work. Thanks for the tip! – Mark Dec 03 '11 at 15:00
  • @Mark - it would be better if it gets built into the Hadoop framework. – Praveen Sripati Dec 03 '11 at 17:19
1

Using new API I retrieved one userdefined counter(Enum in Mapper) and a Inbuilt counter. This is my reducer's code: This is in Setup method of reducer. Although there I have to use some classes of Old API(mapred package)

    JobContext jobContext= new JobContext(context.getConfiguration(), context.getJobID());
    Configuration c= jobContext.getConfiguration();

    jobID=c.get("mapred.job.id");
    //jobId= JobID.forName(jobID);

    JobClient jobClient = new JobClient(new JobConf(c));

    RunningJob job = jobClient.getJob((org.apache.hadoop.mapred.JobID) JobID.forName(jobID));

    Counters counters = job.getCounters();

    long customCounterCount= counters.getCounter(WordCountMapper.CustomCounters.COUNT);

    long totalMapInputRecords = counters.getCounter(Task.Counter.MAP_INPUT_RECORDS);

    System.out.println("customCounterCount==> " + customCounterCount);
    System.out.println("totalMapInputRecords==> " + totalMapInputRecords);