0

As per theory following properties are to define number of map/red task slots at data node. mapred.tasktracker.map.tasks.maximum | mapred.map.tasks.

Also, number of mapper objects is decided by the number of input splits in the MapReduce job. We implement map/red function and framework creates objects and sends them closest to data blocks.

So what is the difference between map task slots and mapper objects created by framework.

Lets say i am storing 2TB file across 5 data nodes, each node having 400Mb. If i define dfs.block.size =100Mb then each node will hold 400/100 = 4 data blocks. Here, out of 4 data blocks we can ideally have 4 input splits and in turn 4 mapper objects per node. And at the same time if i define mapred.tasktracker.map.tasks.maximum = 2 & mapred.map.tasks=2, then what conclusion can i make out of it. Can i say 4 mapper objects are to be shared among 2 map task slots. I might be going in a wrong direction, any clarification would be helpful.

1 Answers1

0

The map slots determine how many map tasks the tasktrackers can run. The map tasks is determined by input splits and you can't change it. If the map tasks more than map slots some map tasks will block and run until other tasks has finished.

mashuai
  • 561
  • 3
  • 10
  • thank you for clarifying.so technically what is the difference between map slots and map tasks. can i say map tasks are nothing but multiple jvms( to run multiple mapper objects). And what is SLOT then ? – user3159843 Apr 22 '14 at 11:13
  • Hadoop user `slot` to divide computer resources. Computer resources include memory and cpu. – mashuai Apr 22 '14 at 15:07