Hive job running forever on reduce phase

Question

I am running a query on 60 GB dataset in Hive. When i fire a query 270 mappers will finish within 15 Mins, but when it comes to reducer state to complete 0.01% , its taking minimum 45 mins to 1hr. So the job is running like forever. Is there any way to fix

look there is some soume in join https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization — sandeep rawat, Jul 23 '16 at 13:02

score 0 · Answer 1 · edited Apr 19 '18 at 12:30

Why dont u use a combiner and partitioner?

Case
You can use combiner which is a mini reduce phase.
For ex: if your mapper producers 100 lines of data , and if you use combiner to perform aggregate on it..it would be reduced to one line and 1 line * 270(mappers) = 270 lines and that is fed as input.
Case:
You can use partitioner to partition data based on a key(if unique) or value (in-range) like if value > 20 return 0;else return 1. By this we will have more reducers to process data.

Hive job running forever on reduce phase

1 Answers1