I have implemented secondary sorting for my application.
File-1 File-2 File-3
------ ------ ------
name,pos,r,value name,pos,r,value name,pos,r,value
aa,1,0,123 aa,2,1,1 aa,3,1,11
bb,1,0,234 aa,2,2,34 aa,3,2,12
aa,2,3,55 aa,3,3,13
bb,2,1,99 bb,3,1,15
bb,2,2,54 bb,3,2,19
bb,2,3,32 bb,3,3,13
For every record in File-1, three records will be available in File2 and File3 each.
composite key is ::name + (pos+r)
natural key is :: name
sorting order is based on the composite key. Ascending order based on (pos+r)
Expected output is
File1 contents of a particular name (aa) followed by all file2 contents (three rows of aa ordered based on pos+r) and then followed by file three contents (three rows of aa ordered based on pos+)
aa,123,1,34,55,11,12,13
bb,234,99,54,32,15,19,13
I have implemenyed this in secondary sorting using setGroupingComparatorClass, setSortComparatorClass and custom partitioner.
My doubts are : ??
1) How to add combiner for this scenario.
- According to my understanding, the grouping and sorting happens in the reducer phase once all the map outputs (which are partitioned based on natural key)are transferred to reduce machine.
2) If combiner is added, how and when the sorting will happen so that the reduce function receives outputs from all mapper in proper order .
- Will the map outputs be sorted twice, once in combiner that's executed after every map and again on the reducer side to sort all the combiner outputs ?