I'm new to Hadoop MapReduce and I've recently encountered a problem in how to do the binning of output values in the mapper. My mapper creates a Text, IntWritable output with a dataset ID as a key and a length of metadata description as a value. My goal is to bin the frequencies of metadata length into 3 groups: 1-200 characters, 201-400 characters, and 401+ characters. The output file looks as follows (first column is the key, second column is the value - length of metadata):
1 256
2 344
3 234
4 160
5 432
6 121
7 551
8 239
9 283
10 80
...
Based on the values above the binning result should display:
1-200 3
201-400 5
401-... 2
Any ideas on how to approach it? Should I do it as the Mapper cleanup, Combiner or within a Reducer? Code examples or references to other online sources would be appreciated. Thank you.