-1

I have a MR job which uses multipleoutput format and outputs 500 files. I want to zip those files without merging them.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Pooja3101
  • 701
  • 3
  • 8
  • 13

1 Answers1

0

You have to use SequenceFileOutputFormat : An OutputFormat that writes keys, values to SequenceFiles in binary(raw) format

You can have three variations in SequenceFile.CompressionType

BLOCK : Compress sequences of records together in blocks.

NONE : Do not compress records.

RECORD: Compress values only, each separately.

Key changes in your code.

Path outDir = new Path(WORK_DIR_PREFIX + "/out/" + jobName);

job.setOutputFormatClass(SequenceFileOutputFormat.class);

SequenceFileOutputFormat.setOutputPath(job, outDir);

SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

Have a look at working example on usage of SequenceFileOutputFormat.

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211