2

I am trying to add error handling in my step function flow using the Parallel and Catch blocks as defined in the State Machine Language.

Following is the flow diagram of my step functions:

enter image description here

Since I want a common error handler for all the step functions, I have wrapped them in a Parallel block and added a common Catch block to catch any errors in any of the step functions. On looking through various examples and blogs, I followed this link and implemented a similar approach.

What I observe is that, whenever any state raises an exception, the control goes into the catch block. The input to the catch block is the exception that was raised containing an Error and Cause in a JSON object. Since I wanted the error along with the input that was passed to that state, I added the ResultPath as "$.error" in the catch block. Following is the JSON spec that defines the state machine.

    {
  "StartAt": "Try",
  "States": {
    "Try": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "Step-1",
          "States": {
            "Step-1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-1-lambda",
              "Next": "Step-2"
            },
            "Step-2": {
              "Type": "Choice",
              "Choices": [
                {
                  "Variable": "$.some_variable",
                  "StringEquals": "some_string",
                  "Next": "Step-3"
                },
                {
                  "Variable": "$.some_variable",
                  "StringEquals": "some_other_string",
                  "Next": "Step-4"
                }
              ],
              "Default": "Step-6"
            },
            "Step-3": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-3-lambda",
              "Next": "Step-6"
            },
            "Step-4": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-4-lambda",
              "Next": "Step-6"
            },
            "Step-6": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-6-lambda",
              "End": true
            }
          }
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "ResultPath": "$.error",
          "Next": "ErrorHandler"
        }
      ],
      "Next": "UnwrapOutput"
    },
    "UnwrapOutput": {
      "Type": "Pass",
      "InputPath": "$[0]",
      "End": true
    },
    "ErrorHandler": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:eu-west-1:1234:function:step-7-lambda",
      "End": true
    }
  }
}

For example, consider that Step-4 generates an exception. The input to this state is:

{
   "foo": "abc",
   "bar": "def"
}

The input with which the state machine is triggered is:

{
  "boo": "jkl",
   "baz": "mno"
}

In the ErrorHandler, as Step-4 generates an exception I was expecting that the input to the ErrorHandler state would be:

{
  "foo": "abc",
   "bar": "def",
   "error": {
       "Error": "SomeError",
       "Cause": "SomeCause"
   }
}

However, the input received consists of the original input that is used to trigger the flow.

{
  "boo": "jkl",
   "baz": "mno",
   "error": {
       "Error": "SomeError",
       "Cause": "SomeCause"
   }
}

I need to access the input fields of the state that caused the exception in the ErrorHandler. Using "$" it provides the input that was used to trigger the flow. Is there a way I can achieve this?

Any help would be appreciated, I am trying to figure this out since a long time.

ShwetaJ
  • 462
  • 1
  • 8
  • 32
  • Thank you for including the graph and the details. I believe something like the following can help you: https://stackoverflow.com/a/64436403 or https://dev.to/aws-builders/parallel-task-error-handling-in-step-functions-4f1c – yegeniy Jan 12 '21 at 00:20

1 Answers1

0

I'm only 10 months late, not that much haha but I hope you have already found a solution for this, In any case, I will share my two cents so I can help another dev, or even better, someone can show me a better way to do this!

First, let's see what scenarios we have:

  • Sych jobs execution
  • Asynch jobs execution

Our goal: To access the job that triggered the error somehow

First solution - Apply for all scenarios:

  • Basically, add custom try catch's to all your jobs assets, in other words, your lambda functions should throw an error that provides info about the job which it's using it. I don't like that approach that much because you are changing your isolated functions in order to achieve some logic in your state machine. In the end, you are coupling two separated concepts, your state machine shouldn't need external tools to operate and log about its own context. I could be wrong here, but that's only my two cents, feel free to offend my family (just kidding, but correct me as you wish).

Second solution - Apply to Sych jobs execution

  • When you add an "addCatch" in your state machine, the default behavior it's the error output to overwrite the step input. To solve this you only need to change the addCatch resultPath, this way you will store the error output alongside the step input.

    EX: "Catch": [ { "ErrorEquals": [ "States.All" ], "Next": "ErrorHandler" "ResultPath": "$.error-info" } ]

But Why this is important??????
  • This way you will be able to access the step input in the errorHandlerJob, which means that you can always pass the stepName into the next step input, this way you would always know which job failed. And you won't do this by changing your lambda function directly, but by using the job's properties, solving the coupling issue! But this won't work in the ASYNC scenario and I'll explain next.

Third Solution -- Apply to Asynch jobs execution

  • The previous solution won't work here because in this case, you can only access the original input since you are using parallel branches. So what I did here was similar to the last case. I added Pass states in parallel branches and these Pass states are responsible for invoking my jobs synchronously, also all of my jobs have their own errorHandlingJob NOT DIFFERENT LAMBDA FUNCTIONS THO. I'm not creating new resources on AWS, there's only one HandleError Lambda function, so I can focus my monitoring on that specific function. But, I use it to create one errorHandlingJob for each job my state machine has to execute.
  • The downside it's the huge graph your state machine has now, but the good part is that you are now able to log which job failed.
Without any abstraction it would be something like this "USING CDK"
    const job1 = new tasks.LambdaInvoke(scope, 'First Job -- PASS', {
        lambdaFunction: function1,
        outputPath: '$.Payload'
    })

    const job2 = new tasks.LambdaInvoke(scope, 'Second Job -- PASS', {
        lambdaFunction: function2,
        outputPath: '$.Payload'
    })

    const job3 = new tasks.LambdaInvoke(scope, 'Third Job -- PASS', {
        lambdaFunction: function3,
        outputPath: '$.Payload'
    })

    const generateHandleErrorJob = () => new tasks.LambdaInvoke(scope, `Handle Error Job ${Math.random() * 160000000}`, {
        lambdaFunction: functionError,
        outputPath: '$.Payload'
    })

    const jobToThrowError = new tasks.LambdaInvoke(scope, 'Job To Throw Error -- PASS', {
        lambdaFunction: fucntionThrowError,
        outputPath: '$.Payload',
    })

    const generatePassCheckSetep = (stepName: string) => new sfn.Pass(scope, `Pass: ${stepName}`, {
        resultPath: '$.step-info',
        result: sfn.Result.fromObject({
            step: stepName
        })
    })

    const definition = new sfn.Parallel(scope, 'Parallel Execution -- PASS')
        .branch(generatePassCheckSetep('job1').next(job1.addCatch(generateHandleErrorJob(), {resultPath: '$.error-info'})))
        .branch(generatePassCheckSetep('jobToThrowError').next(jobToThrowError.addCatch(generateHandleErrorJob(), {resultPath: '$.error-info'})))
        .branch(generatePassCheckSetep('job2').next(job2.addCatch(generateHandleErrorJob(), {resultPath: '$.error-info'})))
        .next(job3)

    new sfn.StateMachine(scope, id, {
        definition,
        timeout: cdk.Duration.minutes(3)
    })

But I also created an abstraction "ParallelStateMachineCatch" so you can use just like this:

this.definition = new ParallelStateMachineCatch(this, 
}, handleErrorFunction)
  .branchCatch(job1)
  .branchCatch(job2)
  .branchCatch(job3)
  .branchCatch(job4)
  .branchCatch(job5)
  .branchCatch(job6)
  .next(final)

}

Here's the ParallelStateMachineCatch code:

import { Construct, Duration } from 'monocdk'
import { NodejsFunction } from 'monocdk/aws-lambda-nodejs'
import { Pass,Result, Parallel, ParallelProps } from 'monocdk/aws-stepfunctions'
import { LambdaInvoke } from 'monocdk/aws-stepfunctions-tasks'

export interface DefinitionProps {
  sonosEnvironment: string
  region: string
  accountNumber: string
}

export class ParallelStateMachineCatch extends Parallel {
  private errorHandler: NodejsFunction

  constructor(scope: Construct, id: string, props: ParallelProps, errorHandler: NodejsFunction) {
    super(scope, id, props)
    this.errorHandler = errorHandler
  }



  branchCatch(task: LambdaInvoke): ParallelStateMachineCatch {
    const randomId = Math.random().toString().replace('0.', '')
    const passInputJob = ParallelStateMachineCatch.generatePassInput(this, task.id, randomId)
    const handleErrorJob = ParallelStateMachineCatch.generateHandleErrorJob(this, this.errorHandler, randomId)
    const resultPath = '$.error-info'

    this.branch(passInputJob.next(task.addCatch(handleErrorJob, { resultPath })))

    return this
  }

  private static generateHandleErrorJob(scope: Construct, errorHandler: NodejsFunction, randomId: string): LambdaInvoke {
    return new LambdaInvoke(scope, `Handle Error ${ randomId }`, {
      lambdaFunction: errorHandler,
      outputPath: '$.Payload',
      timeout: Duration.seconds(5),
    })
  }

  private static generatePassInput(scope: Construct, stepName: string, randomId: string): Pass {
    return new Pass(scope, `Pass Input ${ randomId }`, {
      resultPath: '$.step-info',
      result: Result.fromObject({
        name: stepName
      })
    })
  }

}

Anyway, I hope I can help someone with this, that's how I managed to solve this issue. Please, feel free to teach me better ways! Tks Good Luck and Good Code