1

I have a scenario where I need to post process results that have been produced by a group of discrete Step Functions. How can I orchestrate this arrangement such that, if I have Step Function A, B and C. Once A, B and C have completed successfully then trigger Step Function D.

Step Function D will take as a payload outputs from Step Functions A, B and C. A, B and C are triggered from an external Java Microservice. I have a Dynamo DB table containing details of A, B and C, so I know which execution IDs belong together.

This seems to be quite a common pattern so I was hoping that there was already some sort of robust design to address it.

I have thought about using SNS to trigger an event when Step Functions A, B and C complete but I need to capture these events together in a group. So if I had a Lambda which captured the event, I would need to somehow know which event this is and whether or not all prior events have been received. I could use a Dynamo DB table to track each Step Functions completion status, at the end of the Step Function update the row. Then the lambda when it receives the completion event can check if each of the rows pertaining to the group of executions is marked as completed? Would this introduce a race condition? is this a trustworthy method?

zaf187
  • 505
  • 4
  • 9

3 Answers3

0

You will want to use the Optimized Service integration with Step Functions (aka "Nesting") and the Map state.

In this example below, the first Pass state generates a list of state machines to run in Group 1. These go into the Map state, which then executed each of the state machines in parallel and waits for completion using the Run a Job (.sync) integration pattern. Once all of these complete, the consolidated results will be passed as an array to state machine D.

{
  "StartAt": "Generate Group A List",
  "States": {
    "Generate Group A List": {
      "Type": "Pass",
      "Result": {
        "group_a": [
          "arn:aws:states:<region>:<account_id>:stateMachine:<state_machine_A>",
          "arn:aws:states:<region>:<account_id>:stateMachine:<state_machine_B>",
          "arn:aws:states:<region>:<account_id>:stateMachine:<state_machine_C>"
        ]
      },
      "Next": "Group A"
    },
    "Group A": {
      "Type": "Map",
      "ItemProcessor": {
        "ProcessorConfig": {
          "Mode": "INLINE"
        },
        "StartAt": "Execute Group A State Machine",
        "States": {
          "Execute Group A State Machine": {
            "Type": "Task",
            "Resource": "arn:aws:states:::states:startExecution.sync:2",
            "Parameters": {
              "StateMachineArn.$": "$",
              "Input": {
                "StatePayload": "Input",
                "AWS_STEP_FUNCTIONS_STARTED_BY_EXECUTION_ID.$": "$$.Execution.Id"
              }
            },
            "End": true
          }
        }
      },
      "ItemsPath": "$.group_a",
      "Next": "Execute State Machine After"
    },
    "Execute State Machine After": {
      "Type": "Task",
      "Resource": "arn:aws:states:::states:startExecution.sync:2",
      "Parameters": {
        "StateMachineArn": "arn:aws:states:<region>:<account_id>:stateMachine:<state_machine_D>",
        "Input": {
          "group_a_results": "$",
          "AWS_STEP_FUNCTIONS_STARTED_BY_EXECUTION_ID.$": "$$.Execution.Id"
        }
      },
      "End": true
    }
  }
}

enter image description here

enter image description here

Justin Callison
  • 1,279
  • 2
  • 6
0

I ended going with my approach and i used aws sqs to queue the events marking each step functions completion. I had a lambda which handled the events one by one. A dynamo db table which ties all the step function executions together using a common guid

zaf187
  • 505
  • 4
  • 9
0

Not sure I get your point or not, but how about using Parallel State..

  • Parallel states would work if you split the work out within the Step Function but that's not the case here. The Step Functions are independent of each other and are triggered from an external JVM. – zaf187 Jul 11 '23 at 13:19
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 13 '23 at 08:41