4

Is it possible to turn on mapper output compression on new mapreduce API and if so could you please point how? I see lot of examples doing so based on hadoop.mapred.JobConf API but no one for mapreduce API.

If it is not configurable through new API can I do something to get it to work?

Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110

1 Answers1

4

You can use the following codes to enable the map output compression:

public static void enableMapOutputCompress(Job job) {
    job.getConfiguration().setBoolean("mapred.compress.map.output", true);
    job.getConfiguration().setClass("mapred.map.output.compression.codec",
            SnappyCodec.class, CompressionCodec.class);
}

You can change org.apache.hadoop.io.compress.SnappyCodec to other compression class, for example: org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.LzoCodec.

I suggest to use SnappyCodec.

zsxwing
  • 20,270
  • 4
  • 37
  • 59
  • 1
    Additional thanks for point to snappy codec. Happened to be really effective on reducer output. Mapper output compression barely passed 'if it worth it' checks. – Roman Nikitchenko Jun 28 '13 at 12:42