5

I have configured Step Function with AWS Batch Jobs. All configuration working well but I need to customize starting instance. For this purpose I use Launch Template service and build simple (empty) configuration based on instance type used in AWS Batch configuration. When Compute Environment is build with Launch Template, Batch Job is stuck on RUNNABLE stage. When I run AWS Batch Job without Launch Template everything works OK. Lunching instance form template also works OK. Could anyone give me any advice what is wrong or missing? Below are definitions of whole stack elements.

Launch Template definition enter image description here

Compute environment details Overview

Compute environment name senet-cluster-r5ad-2xlarge-v3-4
Compute environment ARN arn:aws:batch:eu-central-1:xxxxxxxxxxx:compute-environment/senet-cluster-r5ad-2xlarge-v3-4
ECS Cluster name arn:aws:ecs:eu-central-1:xxxxxxxxxxxx:cluster/senet-cluster-r5ad-2xlarge-v3-4_Batch_3323aafe-d7a4-3cfe-91e5-c1079ee9d02e
Type MANAGED
Status VALID
State ENABLED
Service role arn:aws:iam::xxxxxxxxxxx:role/service-role/AWSBatchServiceRole
Compute resources
Minimum vCPUs 0
Desired vCPUs 0
Maximum vCPUs 25
Instance types r5ad.2xlarge
Allocation strategy BEST_FIT
Launch template lt-023ebdcd5df6073df
Launch template version $Default
Instance rolearn:aws:iam::xxxxxxxxxxx:instance-profile/ecsInstanceRole
Spot fleet role
EC2 Keypair senet-test-keys
AMI id ami-0b418580298265d5c
vpcId vpc-0917ea63
Subnets subnet-49332034, subnet-8902a7e3, subnet-9de503d1
Security groups sg-cdbbd9af, sg-047ea19daf36aa269

AWS Batch Job Definition

{
    "jobDefinitionName": "senet-cluster-job-def-3",
    "jobDefinitionArn": "arn:aws:batch:eu-central-1:xxxxxxxxxxxxxx:job-definition/senet-cluster-job-def-3:9",
    "revision": 9,
    "status": "ACTIVE",
    "type": "container",
    "parameters": {},
    "containerProperties": {
        "image": "xxxxxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/senet/batch-process:latest",
        "vcpus": 4,
        "memory": 60000,
        "command": [],
        "jobRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/AWSS3BatchFullAccess-senet",
        "volumes": [],
        "environment": [
            {
                "name": "BATCH_FILE_S3_URL",
                "value": "s3://senet-batch/senet_jobs.sh"
            },
            {
                "name": "AWS_DEFAULT_REGION",
                "value": "eu-central-1"
            },
            {
                "name": "BATCH_FILE_TYPE",
                "value": "script"
            }
        ],
        "mountPoints": [],
        "ulimits": [],
        "user": "root",
        "resourceRequirements": [],
        "linuxParameters": {
            "devices": []
        }
    }
}
Geo ZiDani
  • 111
  • 1
  • 1
  • 8
  • Batch recommends to use ECS optimized AMI but if we want to use the custom AMI we have to make sure that it has all the required component installed upon it and it should be able to join the back end ecs cluster, also we have to make sure that it does have required configuration for the aws cloudwatch log driver so that it can use the CloudWatch logs with batch jobs. – Mech Jul 24 '20 at 23:15

1 Answers1

1

For those of you who had the same problem. Here are the solution works for me. it took me a few days to figure it out.

The default AWS AMI snapshots need at least 30G of storage. When you do not have the launch template, the cloudformation will use the correct storage size.

In my case, I defined only 8G of storage in my launch template. And when the launch template is used, the jobs are stuck in runnable.

Simply change the storage in your launch template to anything bigger than 30G. It shall work.

Also, do not forget IamInstanceProfile and SecurityGroupIds are required in the launch template for the job to get started.