Pseudo distributed : Need to change number of mapper nodes

Question

I am using a Intel(R) Core(TM)2 Duo processor. I have installed hadoop in pseudo distributed mode. I have written a program which needs 50 mappers nodes. Is it possible to have 50 mapper nodes in the pseudo distributed mode or I will be limited to 4 nodes(2 * number of cores) . I have tried setting "mapred.tasktracker.map.tasks.maximum" to 50, but there is no change in concurrency.

score 0 · Answer 1 · answered Feb 05 '13 at 20:11

The maximum number of map and reduce tasks depends on the number of task trackers in your cluster and the maximum number of map/reduce tasks per node defined using the properties mapreduce.tasktracker.map.tasks.maximum and mapreduce.tasktracker.reduce.tasks.maximum.

I assume your map reduce job needs 50 map tasks in the default block size configuration. The number of map tasks needed for a job depends on the number of InputSplits for the processed data. Definitely you should not depend on the number of needed map tasks or define this limit anyhow in your program. This would impact the scaling of your map reduce job.

One option would be to to set the maximum number of mapper tasks to 50. The number of the available mapper tasks should be visible in the cluster summary section of the job tracker web ui. However as your processor has only two cores, you should reconsider, whether launching 50 mappers concurrently will have any positive impact on the performance of your map reduce job.

Pseudo distributed : Need to change number of mapper nodes

1 Answers1