I'm using the new Hadoop API to write a sequence of map-reduce jobs. I plan to use Oozie to pipeline all of these together, but I can't seem to find a way to do multiple output streams from a map-reduce
node in the workflow.
Normally to write multiple outputs I would use code similar to the code given in the MultipleOutputs javadoc, but oozie gets all its configuration from workflow.xml
file so the named outputs cannot be configured like they are in the example.
I've come across a thread discussing the use of multiple outputs in Oozie, but there was no solution presented beyond creating a Java task and adding it to the Oozie pipline directly.
Is there a way to this via a map-reduce
node in the workflow.xml
?
Edit:
Chris's solution did work, though I wish there was a better way. Here are the exact changes I made.
I added the following to the workflow.xml file:
<property>
<name>mapreduce.multipleoutputs</name>
<value>${output1} ${output2}</value>
</property>
<property>
<name>mapreduce.multipleoutputs.namedOutput.${output1}.key</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.multipleoutputs.namedOutput.${output1}.value</name>
<value>org.apache.hadoop.io.LongWritable</value>
</property>
<property>
<name>mapreduce.multipleoutputs.namedOutput.${output1}.format</name>
<value>org.apache.hadoop.mapreduce.lib.output.TextOutputFormat</value>
</property>
<property>
<name>mapreduce.multipleoutputs.namedOutput.${output2}.key</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.multipleoutputs.namedOutput.${output2}.value</name>
<value>org.apache.hadoop.io.LongWritable</value>
</property>
<property>
<name>mapreduce.multipleoutputs.namedOutput.${output2}.format</name>
<value>org.apache.hadoop.mapreduce.lib.output.TextOutputFormat</value>
</property>
I added the following to the job.properties file that is fed to oozie at startup:
output1=totals
output2=uniques
Then in the reducer I wrote to the named outputs totals
and uniques
.