Optimizing the hive query :Apache Hive

Asked Apr 03 '17 at 09:06

Active Apr 03 '17 at 14:49

Viewed 115 times

The following hive query which finds the lead and lag on a single column. The query spawns 1 Mapper and 50 Reducers. How can i optimize the query to spawn less reduces.

Table description

col_name        data_type       comment
# col_name              data_type               comment

a                       int

Data in tale

 select * from foo;
OK

 foo.a 1 2 3 4 5 6 3 4 6 78 9 7 NULL

select lag(a,1) over (order by a) as next,lead(a,1) over (order by a) as prev from foo;

Query ID = phodisvc_20170403015502_de129135-eb19-4c4d-8161-c3f217a45928 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 50 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489146839620_136214 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 50

edited Apr 03 '17 at 13:49

David דודו Markovitz

42,900
6
64
88

asked Apr 03 '17 at 09:06

wandermonk

6,856
6
43
93

what is file the Type? – Kanagaraj Dhanapal Apr 03 '17 at 13:12
Example: ORC, RCFILE, TEXTFILE .. etc – Kanagaraj Dhanapal Apr 03 '17 at 13:13
Please share `show create table mytable;`and `show table extended like mytable;` – David דודו Markovitz Apr 03 '17 at 13:47
1

And check `mapreduce.job.reduces` – David דודו Markovitz Apr 03 '17 at 14:50
@DuduMarkovitz setting this parameter mapreduce.job.reducers worked for me. Thanks!! – wandermonk Apr 04 '17 at 09:23
Setting it is the obvious, the interesting part here is why you got 50 reducers in the first place. Was this parameter set to 50? – David דודו Markovitz Apr 04 '17 at 09:25

Optimizing the hive query :Apache Hive

0 Answers0