What vaue to be set for mappers and reducers while executing jobs in hadoop and how to decide it?

Question

I am running Hive jobs on hadoop cluster. I just came to know to know that the performance will get improve/change if you just concentrate on different behavior mapper and reducer. But I haven't played with it until. Until no I just played with Hive and executing queries with default mapper and reducer?

As I know about the mapper and reducer I am worried that what value to be set the mapper and reducer so that performance will get vary. I also thinking that is it need to set to master node only or we have to set for all nodes?

Anyone who has idea related to this please explain me scenario about this.

Also what are the other parameters do we need to set while executing jobs?

yes. Actually I tried to set it but it is not reflecting in respective job xml file. Every job create it own XML file. Correct me f I am wrong. I check out my logs and I find all the environmental variables for hadoop which I had set are in respective XML. Where to fire te command for setting these variables (set mapred.map.tasks, .....)? am setting at /home/hadoop/hive-0.7.1/bin/hive -e 'set mapred.map.tasks'? Is this correct? — Bhavesh Shah, May 09 '12 at 04:41

score 1 · Accepted Answer · answered May 09 '12 at 06:57

1

In best of my understanding number of mappers is not something you set per job. It is calculated by JobTracker taking into account number of slots per node (something you set Cluster wide in the MapRed-site.xml , number of splits you have, and other jobs (if you use Fair of Capacity Scheduler - you queue parameters are also taken into account).
Number of reducers affects results and thereof you can set it per job. by following command
set mapred.reduce.tasks=128

answered May 09 '12 at 06:57

David Gruzman

7,900
1
28
30

What I did is that I have set this value on path: "/home/hadoop/hive-0.7.1/bin/hive -e 'set mapred.reduce.tasks=128'". When I run my job I see that the job's xml file has different value than that I set it. Actually I want increase the performance of execution of jobs. So I am trying to pley with mapper and reducers. I want to set this all value in mapred-site.xml so that it can reflect this properties to all jobs which will execute. For that what should I do? Where should I execute these commands (I mean Path)? – Bhavesh Shah May 09 '12 at 10:01
Look pls on the followin answer (mine) http://stackoverflow.com/questions/10448204/how-to-increase-the-mappers-and-reducers-in-hadoop-according-to-number-of-instan/10469029#10469029 – David Gruzman May 09 '12 at 11:06
Fair or(not of) Capacity Scheduler, please. Can't edit this answer beacuse less than 6 characters. :-( – zeekvfu Nov 23 '13 at 04:36

What vaue to be set for mappers and reducers while executing jobs in hadoop and how to decide it?

1 Answers1