I am new to hadoop and have been struggling to write a mapreduce algorithm for finding top N values for each A value. Any help or guide to code implementation would be highly appreciated.
Input data
a,1
a,9
b,3
b,5
a,4
a,7
b,1
output
a 1,4,7,9
b 1,3,5
I believe we should write a Mapper that would read the line, split the values and allow it to be collected by reducer. And once in the reducer we have to do the sorting part.