I have a streaming job that I am calling through Oozie. I am able to run this successfully with a mapper and reducer. But what I am failing to understand is, how do I pass the combiner. All my mapper, reducer and combiner are written in Python. Will this work?
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${HADOOP_LIB}/OutPath"/>
</prepare>
<streaming>
<mapper>python mapper.py</mapper>
<combiner>python combiner.py</combiner>
<reducer>python reducer.py</reducer>
</streaming>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${HADOOP_LIB}/input</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${HADOOP_LIB}/OutPath</value>
</property>
</configuration>
<file>mapper.py</file>
<file>combiner.py</file>
<file>reducer.py</file>
</map-reduce>
I could not find anywhere the use of tags. Alternatively can I just use the streaming jar command with -combiner option in a shell script and call that job from Oozie.