Normally, Hadoop map/reduce job produces list of key-value pairs that are written to job's output file (using OutputFormat
class). Rarely, both keys and values are useful, usually either keys or values contain required information.
Is there an option (on client side) to suppress keys in output file or to suppress values in output file?
If I wanted to do this for just one particular job, I could create new OutputFormat
implementation that would ignore keys or values. But I need generic solution that is reusable for more jobs.
EDIT: It might be unclear what I mean by "I need generic solution that is reusable for more jobs." Let me explain that on example:
Let's say I have a lot of prepared Mapper
, Reducer
, OutputFormats
classes. I want to combine them to different 'jobs' and run those 'jobs' on different input files to produce various output files. In some cases (for some jobs) I need to suppress keys, so they are not written to output file. I do not want to change code of my mappers, reducers of output formats - there is just too many of them to do that. I need some generic solution that does not need to change code of given mappers, reducer or output formats. How do I do that?