1

I'm writing a workflow using Amazon SWF and was wondering how to control the TPS on downstream services.

I have a parent workflow that kicks off several child workflows that run in parallel.

My child workflow calls several downstream services (each within a different activity) eg.

  1. call downstream service 1, if success proceed, if fail exit
  2. call downstream service 2, if success proceed, if fail exit
  3. etc

and I want to be able to manage the TPS on the downstream services separately.

How can I limit TPS on the downstream services? For example I ideally want to be able to say I want a max TPS of 100 for downstream service 1. In a non concurrent context I could use something like a Guava RateLimiter, however this will be running across multiple hosts. Can I specify I only want 100 instances of a given activity running at a time? I couldn't find an annotation in the flow framework for that (I am using the Flow framework and Spring). I am happy to break the child workflows into separate workflows if required and have the parent workflow call each child workflow one after the other, eg.

child workflow:

1. take entity id as input
2. call dependent service 1 workflow
3. return

Then if the above completes successfully then the parent workflow will call the next child workflow that calls dependent service 2, or if it fails the parent will quit.

Is it possible to have the concurrency limits on the number of instances of a given dependent service workflow or on a given activity? Is this a good/potential usage of Task Lists? Can I control the TPS via the number of worker hosts?

Thanks for any suggestions!

Allan5
  • 351
  • 5
  • 17

1 Answers1

2

I don't think SWF natively supports rate limiting of activity executions or to be precise maximum task delivery rate for a task list.

The alternative is to implement rate limiting at the worker level. ActivityWorker already supports rate limiting through setMaximumPollRatePerSecond. If a single worker can sustain the calling rate then selecting this worker as a master and pausing all other workers through suspendPolling solves the problem. If more than one worker is necessary then multiple masters are active at the same time each of them rate limited to a portion of the overall rate.

Separate SWF workflow can be used to elect worker[s] as master. The basic idea of master election workflow is to have a GetLock activity. The host that gets to execute it is considered the master. This activity should have small (let's say 20 seconds) heartbeat timeout and large overall timeout. So the host that owns it has to heartbeat at least once every 20 second to keep the lock. If it for some reason fails to heartbeat workflow gets the failure and reschedules the activity for other host to grab.

Maxim Fateev
  • 6,458
  • 3
  • 20
  • 35