0

The following hive query which finds the lead and lag on a single column. The query spawns 1 Mapper and 50 Reducers. How can i optimize the query to spawn less reduces.

Table description

col_name        data_type       comment
# col_name              data_type               comment

a                       int

Data in tale

 select * from foo;
OK

 foo.a 1 2 3 4 5 6 3 4 6 78 9 7 NULL

select lag(a,1) over (order by a) as next,lead(a,1) over (order by a) as prev from foo;

Query ID = phodisvc_20170403015502_de129135-eb19-4c4d-8161-c3f217a45928 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 50 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489146839620_136214 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 50

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
wandermonk
  • 6,856
  • 6
  • 43
  • 93

0 Answers0