Hadoop mapreduce multipleoutputs write into a single file

Question

My MapReduce job uses MultipleOutputs to write files into three separate directories.
My reducer count is 400, which is idle for files written into two directories. For the 3rd directory, I am trying to reduce the number of counter files written since the file size is tiny. So 400 small counter files consume many blocks in HDFS.(I don't want that to happen)

I want to keep the reduce count the same and only reduce the files written into one directory. Does MapReduce support something like a spark coalesce? Or can multipleoutputs help in someway to just write into 1 or 2 files instead of 400 ?

"My reducer count is 400, which is idle for files written into two directories" what does it mean - there are 400 reducers in total and they're all idle? Also what are "counter files"? — Danio, Jul 20 '21 at 22:17

score 0 · Answer 1 · answered Aug 02 '21 at 10:16

0

I want to keep the reduce count the same and only reduce the files written into one directory.

Every reducer writes to separate file. If you want to reduce count of files you need to reduce count of reducers.

answered Aug 02 '21 at 10:16

Egor

1,334
8
22

Hadoop mapreduce multipleoutputs write into a single file

1 Answers1