I am running a query on 60 GB dataset in Hive
. When i fire a query 270 mappers
will finish within 15 Mins, but when it comes to reducer state to complete 0.01%
, its taking minimum 45 mins to 1hr. So the job is running like forever. Is there any way to fix
Asked
Active
Viewed 493 times
0

BruceWayne
- 3,286
- 4
- 25
- 35
-
look there is some soume in join https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization – sandeep rawat Jul 23 '16 at 13:02
-
how many reducers are running? – JiminyCricket Jul 23 '16 at 13:38
-
Total Reducers: 1033 and Running state: 267 – BruceWayne Jul 23 '16 at 14:01
1 Answers
0
Why dont u use a combiner and partitioner?
Case
You can use combiner which is a mini reduce phase.
For ex: if your mapper producers 100 lines of data , and if you use combiner to perform aggregate on it..it would be reduced to one line and 1 line * 270(mappers) = 270 lines and that is fed as input.Case:
You can use partitioner to partition data based on a key(if unique) or value (in-range) like if value > 20 return 0;else return 1. By this we will have more reducers to process data.

mnille
- 1,328
- 4
- 16
- 20