2

I'm having issues getting my Lambda configured correctly to be able to run batch jobs. The code looks like this:

client = boto3.client('batch')

_job_queue = os.environ['JOB_QUEUE']
_job_definition = os.environ['JOB_DEFINITION']
_job_name = os.environ['START_JOB_NAME']

def lambda_handler(event, context):
    return start_job()

def start_job():
    response = client.list_jobs(jobQueue=_job_queue)
    if _job_name in [job.jobName for job in response['jobSummaryList']]:
        return 200

    try:
        client.submit_job(jobName=_job_name, jobQueue=_job_queue, jobDefinition=_job_definition)
        return 201
    except:
        return 400

It's failing on client.list_jobs(jobQueue=_job_queue), with the following error:

"errorMessage": "An error occurred (AccessDeniedException) when calling the ListJobs operation: User: arn:aws:sts::749340585813:assumed-role/myproject/dev-StartJobLambda-HZO22Z5IMTFB is not authorized to perform: batch:ListJobs on resource: arn:aws:batch:us-west-2:749340585813:/v1/listjobs",

If I add my access keys to the lambda above, it works fine. I assume this is because I have administrator access, and authenticating as my user gives the lambda my privileges.

My lambda definition looks like:

"StartJobLambda": {
  "Type": "AWS::Lambda::Function",
  "Properties": {
    "Description": "Starts the My Project model training job.",
    "Role": {
      "Fn::GetAtt": [
        "StartJobRole",
        "Arn"
      ]
    },
    "Runtime": "python3.6",
    "Handler": {
      "Fn::Sub": "${StartJobModule}.lambda_handler"
    },
    "Tags": [
      {
        "Key": "environment",
        "Value": {
          "Ref": "Environment"
        }
      },
      {
        "Key": "project",
        "Value": "myproject"
      }
    ],
    "Environment": {
      "Variables": {
        "JOB_QUEUE": {
          "Ref": "JobQueue"
        },
        "JOB_DEFINITION": {
          "Ref": "TrainingJob"
        }
      }
    },
    "Code": {
      "S3Bucket": {
        "Ref": "CodeBucket"
      },
      "S3Key": {
        "Ref": "StartJobKey"
      }
    },
    "VpcConfig": {
      "SubnetIds": [
        {
          "Fn::ImportValue": {
            "Fn::Sub": "${NetworkStackNameParameter}-PrivateSubnet"
          }
        },
        {
          "Fn::ImportValue": {
            "Fn::Sub": "${NetworkStackNameParameter}-PrivateSubnet2"
          }
        }
      ],
      "SecurityGroupIds": [
        {
          "Fn::ImportValue": {
            "Fn::Sub": "${NetworkStackNameParameter}-TemplateSecurityGroup"
          }
        }
      ]
    }
  }
}

The following role and policy are also created:

"StartJobRole": {
  "Type": "AWS::IAM::Role",
  "Properties": {
    "RoleName": "myproject-start-job",
    "AssumeRolePolicyDocument": {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "lambda.amazonaws.com"
            ]
          },
          "Action": [
            "sts:AssumeRole"
          ]
        }
      ]
    },
    "Path": "/"
  }
},
"StartJobBatchPolicy": {
  "Type": "AWS::IAM::Policy",
  "Properties": {
    "PolicyName": "start-job-batch-policy",
    "PolicyDocument": {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "batch:ListJobs",
            "batch:SubmitJob"
          ],
          "Resource": [
            {
              "Ref": "JobQueue"
            }
          ]
        }
      ]
    },
    "Roles": [
      {
        "Ref": "StartJobRole"
      }
    ]
  }
}

In addition, there is a role to enable the lambda to run on a VPC:

"LambdaVPCExecutionRole": {
  "Type": "AWS::IAM::Role",
  "Properties": {
    "RoleName": "myproject-lambda-vpc-execution-role",
    "AssumeRolePolicyDocument": {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "lambda.amazonaws.com"
            ]
          },
          "Action": [
            "sts:AssumeRole"
          ]
        }
      ]
    },
    "Path": "/"
  }
},
"LambdaVPCExecutionPolicy": {
  "Type": "AWS::IAM::Policy",
  "Properties": {
    "PolicyName": "lambda-vpc-execution-policy",
    "PolicyDocument": {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:*:*:*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "ec2:CreateNetworkInterface",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DeleteNetworkInterface"
          ],
          "Resource": "*"
        }
      ]
    },
    "Roles": [
      {
        "Ref": "LambdaVPCExecutionRole"
      },
      {
        "Ref": "StartJobRole"
      }
    ]
  }
},
Nate Reed
  • 6,761
  • 12
  • 53
  • 67
  • after you have created this stack, if you look in IAM under roles can you confirm the ListJobs action is authorized for resource level permissions? – Usman Mutawakil Feb 22 '18 at 21:22
  • Under "IAM Roles" I see my lambda execution role, and the policies that are attached. The policy StartJobBatchPolicy, which lists the "batch:ListJobs" action for the job queue resource, is visible. – Nate Reed Feb 22 '18 at 21:38
  • When I use the policy editor, it says I'm not allowed to specify resources. It creates a policy that applies to all resources (eg. Resource: "*"). – Nate Reed Feb 22 '18 at 21:42
  • I'm going to try recreating it with the change to "Resource" I noted above. – Nate Reed Feb 22 '18 at 21:44
  • Thank you, that pointed me in the right direction. That small change fixed the issue. Unbelievable! :) – Nate Reed Feb 22 '18 at 22:03
  • Yea these silent failures are a paaain! For the sake of others I've posted this as an answer. Feel free to edit if it doesn't encapsulate your problem/solution. – Usman Mutawakil Feb 23 '18 at 08:47

1 Answers1

2

This is something CloudFormation needs to improve on. Some AWS services don't allow resource level permissions yet when you try creating them your stack will succeed!. For IAM related issues sometimes you need to go into the console and verify your policy is not in a warning state. At a minimum, AWS will flag policies that attempt to apply resource level permissions on services that don't allow it.

For example, for DynamoDB you must grant access to all tables. You can't confine or restict access to a single table. If you try creating a cloudformation IAM policy it will not fail but your desired effect will not be achieved.

Usman Mutawakil
  • 4,993
  • 9
  • 43
  • 80