Why total total order partitioning in hadoop?. Which scenario we need to take total order partitioning ?. My understanding is after multiple reducers, each reducer result will be sorted by key . then why we need to do total order partitioning. Would be great if you could share any graphical rep. of examples?
Asked
Active
Viewed 174 times
1 Answers
1
Total order partitioning will sort the output by key across all the reducers. This allows you to combine output of multiple reducers and still get the sorted output. Simple example below:
Without total order partitioning
reducer 1's output:
(a,val_a)
(m,val_m)
(x,val_x)
reducer 2's output:
(b,val_b)
(c,val_c)
If you combine, the output is not sorted by key anymore.
(a,val_a)
(m,val_m)
(x,val_x)
(b,val_b)
(c,val_c)
With total order partitioning
reducer 1's output:
(a,val_a)
(b,val_b)
(c,val_c)
reducer 2's output:
(m,val_m)
(x,val_x)
If you combine, the output is still sorted by key.
(a,val_a)
(b,val_b)
(c,val_c)
(m,val_m)
(x,val_x)

Jagrut Sharma
- 4,574
- 3
- 14
- 19
-
Thanks a lot.. Well explained. .Quick question..How to combine two reducer output .. are u meaning this meaning hadoop fs -cat /hadfspath/totalpartition_outputpath/* – Learn Hadoop Apr 29 '18 at 15:54
-
You can use this approach to merge the reducer output files: "hadoop fs -getmerge /example_hive/merge_case local_combined.ext". This will get all files in the HDFS at the location /example_hive/merge_case, merge them, and download locally as the local_combined.ext file. – Jagrut Sharma Apr 29 '18 at 19:10