The following hive query which finds the lead and lag on a single column. The query spawns 1 Mapper and 50 Reducers. How can i optimize the query to spawn less reduces.
Table description
col_name data_type comment
# col_name data_type comment
a int
Data in tale
select * from foo;
OK
foo.a 1 2 3 4 5 6 3 4 6 78 9 7 NULL
select lag(a,1) over (order by a) as next,lead(a,1) over (order by a) as prev from foo;
Query ID = phodisvc_20170403015502_de129135-eb19-4c4d-8161-c3f217a45928 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 50 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489146839620_136214 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 50