6

I'm newbie to AWS Step Functions and AWS Batch. I'm trying to integrate AWS Batch Job with Step Function. AWS Batch Job executes simple python scripts which output string value (High level simplified requirement) . I need to have the python script output available to the next state of the step function. How I should be able to accomplish this. AWS Batch Job output does not contain results of the python script. instead it contains all the container related information with input values.

Example : AWS Batch Job executes python script which output "Hello World". I need "Hello World" available to the next state of the step function to execute a lambda associated with it.

pubudut
  • 603
  • 2
  • 8
  • 18

4 Answers4

4

I was able to do it, below is my state machine, I took the sample project for running the batch job Manage a Batch Job (AWS Batch, Amazon SNS) and modified it for two lambdas for passing input/output.

{
  "Comment": "An example of the Amazon States Language for notification on an AWS Batch job completion",
  "StartAt": "Submit Batch Job",
  "TimeoutSeconds": 3600,
  "States": {
    "Submit Batch Job": {
      "Type": "Task",
      "Resource": "arn:aws:states:::batch:submitJob.sync",
      "Parameters": {
        "JobName": "BatchJobNotification",
        "JobQueue": "arn:aws:batch:us-east-1:1234567890:job-queue/BatchJobQueue-737ed10e7ca3bfd",
        "JobDefinition": "arn:aws:batch:us-east-1:1234567890:job-definition/BatchJobDefinition-89c42b1f452ac67:1"
      },
      "Next": "Notify Success",
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "Notify Failure"
        }
      ]
    },
    "Notify Success": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:1234567890:function:readcloudwatchlogs",
      "Parameters": {
        "LogStreamName.$": "$.Container.LogStreamName"
      },
      "ResultPath": "$.lambdaOutput",
      "Next": "ConsumeLogs"
    },
    "ConsumeLogs": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:1234567890:function:consumelogs",
      "Parameters": {
        "randomstring.$": "$.lambdaOutput.logs"
      },
      "End": true
    },
    "Notify Failure": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "Message": "Batch job submitted through Step Functions failed",
        "TopicArn": "arn:aws:sns:us-east-1:1234567890:StepFunctionsSample-BatchJobManagement17968f39-e227-47ab-9a75-08a7dcc10c4c-SNSTopic-1GR29R8TUHQY8"
      },
      "End": true
    }
  }
}

The key to read logs was in the Submit Batch Job output which contains LogStreamName, that I passed to my lambda named function:readcloudwatchlogs and read the logs and then eventually passed the read logs to the next function named function:consumelogs. You can see in the attached screenshot consumelogs function printing the logs.


{
  "Attempts": [
    {
      "Container": {
        "ContainerInstanceArn": "arn:aws:ecs:us-east-1:1234567890:container-instance/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/6d11fdbfc9eb4f40b0d6b85c396bb243",
        "ExitCode": 0,
        "LogStreamName": "BatchJobDefinition-89c42b1f452ac67/default/2ad955bf59a8418893f53182f0d87b4b",
        "NetworkInterfaces": [],
        "TaskArn": "arn:aws:ecs:us-east-1:1234567890:task/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/2ad955bf59a8418893f53182f0d87b4b"
      },
      "StartedAt": 1611329367577,
      "StatusReason": "Essential container in task exited",
      "StoppedAt": 1611329367748
    }
  ],
  "Container": {
    "Command": [
      "echo",
      "Hello world"
    ],
    "ContainerInstanceArn": "arn:aws:ecs:us-east-1:1234567890:container-instance/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/6d11fdbfc9eb4f40b0d6b85c396bb243",
    "Environment": [
      {
        "Name": "MANAGED_BY_AWS",
        "Value": "STARTED_BY_STEP_FUNCTIONS"
      }
    ],
    "ExitCode": 0,
    "Image": "137112412989.dkr.ecr.us-east-1.amazonaws.com/amazonlinux:latest",
    "LogStreamName": "BatchJobDefinition-89c42b1f452ac67/default/2ad955bf59a8418893f53182f0d87b4b",
    "TaskArn": "arn:aws:ecs:us-east-1:1234567890:task/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/2ad955bf59a8418893f53182f0d87b4b",
..
  },
..
  "Tags": {
    "resourceArn": "arn:aws:batch:us-east-1:1234567890:job/d36ba07a-54f9-4acf-a4b8-3e5413ea5ffc"
  }
}

  • Read Logs Lambda code:
import boto3

client = boto3.client('logs')

def lambda_handler(event, context):
    print(event)
    response = client.get_log_events(
        logGroupName='/aws/batch/job',
        logStreamName=event.get('LogStreamName')
    )
    log = {'logs': response['events'][0]['message']}
    return log
  • Consume Logs Lambda Code
import json

print('Loading function')


def lambda_handler(event, context):
    print(event)

enter image description here

enter image description here

samtoddler
  • 8,463
  • 2
  • 26
  • 21
  • So basically you read values from log. There is no other way right – pubudut Jan 22 '21 at 20:13
  • yeah, I looked into [Step Function CallBack](https://docs.aws.amazon.com/step-functions/latest/dg/callback-task-sample-sqs.html) pattern, but the involves adding `wait` in the definition, which I don't want to do. – samtoddler Jan 22 '21 at 20:14
  • This is innovative way to do it. use case I’m dealing with could generate huge logs which reading log to find output of the job could be counter productive but this is helpful. One other way is saving output to a table and refer it in other state. – pubudut Jan 22 '21 at 20:16
  • 1
    I just built on top of the question being asked, so I was able to do what was in the question. You still can pass around logs consider you are not exceeding the [AWS Step Functions payload size 256KB](https://aws.amazon.com/about-aws/whats-new/2020/09/aws-step-functions-increases-payload-size-to-256kb/). Ofcourse you can do it but that again involves adding a wait in your state machine, which I already mentioned I avoided. – samtoddler Jan 22 '21 at 20:19
  • When we talk about saving the output to a table, you can simply [Using Lambda with CloudWatch Logs](https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchlogs.html) and then save it table. But you have to wait in the state machine for and you are never sure when the data is written in the table, which might cause race conditions. – samtoddler Jan 22 '21 at 20:22
  • I upvoted your answer but base on my requirement its not the exact way I was expecting get the output. – pubudut Jan 22 '21 at 20:26
2

You could pass your step function execution ID ($$.Execution.ID) to the batch process and then your batch process could write its response to DynamoDB using the execution ID and a primary key (or other filed). You would then need a subsequent step to read directly from DynamoDB and capture the process response.

I have been on the hunt for a way to do this without the subsequent step, but thus far no dice.

Bwyss
  • 1,736
  • 3
  • 25
  • 48
  • I'm going to try using SSM Parameter Store because I'm too lazy to create a whole new DynamoDB table just for this... – sleepy_keita Feb 15 '22 at 05:31
1

While you can't do waitForTaskToken with submitJob, you can still use the callback pattern by passing the task token in the Parameters and referencing it in the command override with Ref::TaskToken:

...
   "Submit Batch Job": {
      "Type": "Task",
      "Resource": "arn:aws:states:::batch:submitJob.sync",
      "Parameters": {
         "TaskToken.$": "$$.Task.Token"
      },
      "ContainerOverrides": {
         "command": ["python3", 
                     "my_script.py", 
                     "Ref::TaskToken"]
      }
...

Then when your script is done doing its processing, you just call StepFunctions.SendTaskSuccess or StepFunctions.SendTaskFailure:

import boto3

client = boto3.client('stepfunctions')

def main()
    args = sys.argv[1:]
    client.send_task_success(taskToken=args[0], output='Hello World')

This will tell StepFunctions your job is complete and the output should be 'Hello World'. This pattern can also be useful if your Batch job completes the work required to resume the state machine, but needs to do some cleanup work afterward. You can send_task_success with the results and the state machine can resume while the Batch job does the cleanup work.

0

Thanks @samtoddler for your answer.

We used it for a while.

However, recently my friend @liorzimmerman found a better solution.

Using stepfunctions send-task-success

When calling the job from the state machine you need to send the task-token:

  "States": {
"XXX_state": {
  "Type": "Task",
  "Resource": "arn:aws:states:::batch:submitJob.sync",
  "Parameters": {
    "JobDefinition": "arn:aws:batch:us-east-1:XXX:job-definition/task_XXX:4",
    "JobQueue": "arn:aws:batch:us-east-1:XXX:job-queue/XXX-queue",
    "JobName": "XXX",
    "Parameters": {
      "TASK_TOKEN.$": "$$.Task.Token",
    }
  },
  "ResultPath": "$.payload",
  "End": true
}

Next, inside the docker run by the job, the results are sent by:

aws stepfunctions send-task-success --task-token $TASK_TOKEN --task-output $OUTPUT_JSON