As per theory following properties are to define number of map/red task slots at data node.
mapred.tasktracker.map.tasks.maximum | mapred.map.tasks
.
Also, number of mapper objects is decided by the number of input splits in the MapReduce job. We implement map/red function and framework creates objects and sends them closest to data blocks.
So what is the difference between map task slots and mapper objects created by framework.
Lets say i am storing 2TB file across 5 data nodes, each node having 400Mb.
If i define dfs.block.size =100Mb
then each node will hold 400/100 = 4 data blocks. Here, out of 4 data blocks we can ideally have 4 input splits and in turn 4 mapper objects per node. And at the same time if i define mapred.tasktracker.map.tasks.maximum = 2
& mapred.map.tasks=2
, then what conclusion can i make out of it. Can i say 4 mapper objects are to be shared among 2 map task slots. I might be going in a wrong direction, any clarification would be helpful.