If the decision task is acquired by a workflow worker that has never been executed, doesn't it mean that he has to pull the historical state from the cadence service to restore it and then execute the decision task this time. Wouldn't it be very slow? Does cadence have any mechanism to ensure that the entire execution process of workflow decision falls on the same workflow worker?
1 Answers
Yes, Cadence (and the same applies to its younger brother, Temporal) indeed ensure that a workflow execution will continue being processed by the same workflow worker as long as it is reasonably possible. That mechanism is called Sticky Execution and it is enabled by default.
Internally, a new task queue is created for that specific worker, and the workflow instance is silently reassigned to that worker specific task queue (the "sticky queue"). That means that any Workflow Task related to that workflow instance will be written to the sticky queue rather than being written to the regular task queue to which it was initially targeted. The worker poll that dedicated task queue at the same time as it polls the regular queue.
Now, what if that worker dies or get overloaded? If a Workflow Task is left on a sticky queue for too long, the engine will automatically move back that task to the regular task queue, so that it might be pick up by any worker. The exact delay before that happens is a configuration under the name StickyScheduleToStartTimeout
(default value is 5s in Cadence, 10s in Temporal).
As you pointed out, there is a performance penalty incurred when a workflow execution get reassigned to a different worker, since that worker has to fetch and replay the complete execution history. Note however that a replay is not horribly slow, since decisions/activities and timers don't need to be reexecuted: decisions/activities and timers are skipped, by instantly resolving them in the same order and with the same return value as they had historically done on the first worker. Consequently, you can generally expect that replaying an existing workflow execution on a new worker would usually take less than a second.

- 4,211
- 1
- 18
- 34