I am trying to implement iterative MapReduce in hadoop. The result from first MapReduce job is a MapWritable containing two DoubleArrayWritable. Part of my first mapper is :
DoubleWritable[][] Tdata = new DoubleWritable[T.numRows()][T.numColumns()];
for (int k = 0; k < Tdata.length; k++) {
for (int j = 0; j < Tdata[k].length; j++) {
Tdata[k][j] = new DoubleWritable(T.get(k, j));
}
}
DoubleArrayWritable t = new DoubleArrayWritable();
t.set(Tdata);
DoubleWritable[][] Hdata = new DoubleWritable[H.numRows()][H.numColumns()];
for (int k = 0; k < Hdata.length; k++) {
for (int j = 0; j < Hdata[k].length; j++) {
Hdata[k][j] = new DoubleWritable(H.get(k, j));
}
}
DoubleArrayWritable h = new DoubleArrayWritable();
h.set(Hdata);
mw.put(new IntWritable(0), h);
mw.put(new IntWritable(1), t);
context.write(new Text(splitId), mw);
Through use of identity reducer I am finally getting output of mapper as it is as final Output. Now I want to use these output as input to a iterative MapReduce job. The problem is that with each iteration one global variable is getting updated and I want to pass it as input to Mappers in next iteration along with the output of first MapReduce job. Code snippet from driver class
for(it=0;it<10;it++){ //change the stopping condition
outPath = new Path(inPath+"_"+it);
// delete existing directory
if (hdfs.exists(outPath)) {
hdfs.delete(outPath, true);
}
Job job2 = new Job(conf,"OutputWeightCalc");
job2.setMapperClass(secMapper.class);
job2.setMapOutputKeyClass(Text.class);
job2.setMapOutputValueClass(MapWritable.class);
job2.setReducerClass(finalReducer.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(MapWritable.class);
job2.setInputFormatClass(SequenceFileInputFormat.class);
job2.setOutputFormatClass(SequenceFileOutputFormat.class);
FileInputFormat.addInputPath(job2, inPath);
FileOutputFormat.setOutputPath(job2, outPath);
job2.waitForCompletion(true);
count = job2.getCounters();
inPath = outPath;
}
Now the problem is that how can I merge the two outputs in one and pass it as a inputpath to next iteration mapper?? I thought of merging two SequenceFiles created as Output of MR job, but I don't know how to do that. Someone help.
Thank you!!