I am working on a mapreduce project using Hadoop. I currently have 3 sequential jobs.
I want to use Hadoop counters, but the problem is that I want to make the actual count in the first job, but access the counter value in the reducer of the 3rd job.
How can I achieve this? Where should I define the enum
? Do I need to pass it threw the second job? It will also help to see some code example for doing this as I couldn't find anything yet.
Note: I am using Hadoop 2.7.2
EDIT: I already tried the approach explained here and it didn't succeeded. My case is different as I want to access the counters from a different job. (not from mapper to reducer).
What I tried to do: First Job:
public static void startFirstJob(String inputPath, String outputPath) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "wordCount");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(inputPath));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
job.waitForCompletion(true);
}
Defined the counter enum in a different class:
public class CountersClass {
public static enum N_COUNTERS {
SOMECOUNT
}
}
Trying to read counter:
Cluster cluster = new Cluster(context.getConfiguration());
Job job = cluster.getJob(JobID.forName("wordCount"));
Counters counters = job.getCounters();
CountersClass.N_COUNTERS mycounter = CountersClass.N_COUNTERS.valueOf("SOMECOUNT");
Counter c1 = counters.findCounter(mycounter);
long N_Count = c1.getValue();