0

I am running a query on 60 GB dataset in Hive. When i fire a query 270 mappers will finish within 15 Mins, but when it comes to reducer state to complete 0.01% , its taking minimum 45 mins to 1hr. So the job is running like forever. Is there any way to fix

BruceWayne
  • 3,286
  • 4
  • 25
  • 35

1 Answers1

0

Why dont u use a combiner and partitioner?

  1. Case
    You can use combiner which is a mini reduce phase.
    For ex: if your mapper producers 100 lines of data , and if you use combiner to perform aggregate on it..it would be reduced to one line and 1 line * 270(mappers) = 270 lines and that is fed as input.

  2. Case:
    You can use partitioner to partition data based on a key(if unique) or value (in-range) like if value > 20 return 0;else return 1. By this we will have more reducers to process data.

mnille
  • 1,328
  • 4
  • 16
  • 20