2

recently I am running some benchmarks to learn about failover mechanism in Giraph.

Actually I'm curious; when a worker in a job gets slower, the other workers will just wait for it. Later I found something like this in GiraphJob.java:

// Speculative execution doesn't make sense for Giraph
giraphConfiguration.setBoolean("mapred.map.tasks.speculative.execution", false);

Does anyone know why speculative execution is not enabled in Giraph?

Thanks

Raptor
  • 53,206
  • 45
  • 230
  • 366
Algorithman
  • 1,309
  • 1
  • 16
  • 39

1 Answers1

1

At first lets bring back in our mind what speculative execution is. Quoted from Yahoo's Hadoop tutorial:

Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes. By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively

If I got Giraph right they don't use speculative execution because they use their own iterative calculation paradigm where it didn't fit in. This paradigm is inspired by google's pregel which provides a more graph node centric view on the data. Furthermore fault-tolerance is created by checkpointing which means that each iteration, also called superstep, calculates all incoming messages per graph node and afterwards the messages are distributed between the nodes.

Simply spoken MapReduce is not used in it's original way and therefore speculative execution for giraph makes no sense.

Matthias Kricke
  • 4,931
  • 4
  • 29
  • 43
  • It makes sense and in the old literature of BSP (esp. with workstealing), speculative execution is recommend. Why? Because one lagging behind task can totally delay the whole superstep (same goes with rollbacks of calculations). This is a limitation of Giraph's messaging model, nothing else. – Thomas Jungblut Oct 27 '14 at 10:49
  • I just want to make sure. In this case, what it means by iterative calculation is: Giraph must process each of its vertex iteratively. Therefore, if speculative execution was enabled in Giraph, then it would violate Giraph's rule by processing things not in order. Right? What would happen if speculative execution is enabled then? Would it cause inconsistency? – Algorithman Oct 27 '14 at 11:01
  • @Vincentius: I'm not sure if i get you right. But be aware, Giraph is not processing the vertexes iteratively but its algorithm steps. For each of those steps every node in the graph (simply spoken) analyzes it's incoming messages. This is done asynchronously and in parallel for every node. Enabling speculative execution simply didn't make sense since Giraph is not capable of using it. – Matthias Kricke Oct 28 '14 at 07:43