0

I'm getting CannotPullContainerError trying to launch an ECS Fargate task through an AWS Step Function. The docs here say to add a NAT gateway to the subnet. I've done that and still get this error.

I'm using a private subnet, public IP disabled, and have a NAT gateway defined. I have a route table defined to map 0.0.0.0/0 to the NAT Gateway and this route table is associated with the subnet. Associated security group and network ACL allow all outbound traffic. The VPC has DNS resolution enabled.

I've reviewed these related questions:

AWS Fargate - CannotPullContainerError (500)?

Fargate error: cannot pull container hosted in ECR from a private subnet

Fargate Task with Nat Gateway fails to connect with RDS database

Executing Step Function "Tasks" using ECS Fargate

Is there something else I'm missing? I've seen lots of questions here but have already addressed the things mention (usually NAT gateway and route table).

Error:

CannotPullContainerError: Error response from daemon: 
Get https://123456789012.dkr.ecr.us-west-2.amazonaws.com/v2/:
net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)"

Hopefully useful information:

subnetId: subnet-015a0400000000
networkInterfaceId: eni-04e50000000
privateIPv4Address: 10.51.17.8-2c
ClusterArn: arn:aws:ecs:us-west-2:951740000000:cluster/step-function-executor
ContainerArn: arn:aws:ecs:us-west-2:951740000000:container/08450000000",
Image: 951740000000.dkr.ecr.us-west-2.amazonaws.com/step-function-image:latest
NetworkBindings: []
NetworkInterfaces:
    AttachmentId: 4a3b0000000
    PrivateIpv4Address": 10.51.17.8
TaskArn: arn:aws:ecs:us-west-2:951740000000:task/690d0000000
TaskDefinitionArn: arn:aws:ecs:us-west-2:951740000000:task-definition/step-function-xyz
LaunchType: FARGATE
PullStartedAt: 1599440424569
PullStoppedAt: 1599440513569

Route table:

    Destination       Target
    -------------     ---------------
    10.41.0.0./16     local
    0.0.0.0/0         nat-046d0000000

NAT Gateway

    Gateway ID: nat-046d0000000
    Private IP: 10.51.x.x
    Elastic IP Address 52.13.x.x
    
Samuel Neff
  • 73,278
  • 17
  • 138
  • 182
  • 1
    Does your ECS task execution role allow access to ECR? – Marcin Sep 07 '20 at 23:03
  • 1
    Also is the NAT gateway in public subnet with working internet connectivity? – Marcin Sep 07 '20 at 23:05
  • @Marcin The subnet has "auto-assign public IP address" turned on, so that is what is needed to make it public, right? – Samuel Neff Sep 07 '20 at 23:13
  • 1
    You also would need internet gateway (IGW) attached to your VPC and a route table in a public subnet to the IGW. – Marcin Sep 07 '20 at 23:14
  • @Marcin thanks, the role didn't have the ECR permissions. I added it now, but still same error. – Samuel Neff Sep 07 '20 at 23:17
  • So once you would solve the connectivity issue, you would probably get access deny without the role. Thus can you check if your VPC and subnets are correctly setup. For example, if you launch a regular instance into your public setup, does it have internet connectivity when you ssh into it (without internet you wont be able to ssh to it directly anyway)? – Marcin Sep 07 '20 at 23:20
  • @Marcin thanks, yes, if I launch an ec2 instance in the public subnet it has internet access--dns resolution and https. – Samuel Neff Sep 07 '20 at 23:48
  • Can you update the question with route tables for private subnet? Also ensure that NAT is in public subnet, not private. – Marcin Sep 07 '20 at 23:52
  • @Marcin Thank you so much for your continued support. I added information on the route table and nat gateway. – Samuel Neff Sep 09 '20 at 01:40
  • No problem. You wrote "Associated security group and network ACL allow all outbound traffic". What about the inbound traffic? – Marcin Sep 09 '20 at 01:45
  • @Marcin private subnet allows inbound traffic from the entire vpc, denies everything else. – Samuel Neff Sep 09 '20 at 01:54
  • So that's why it probably does not work. You need to allow inbound internet traffic (0.0.0.0/0) flowing in through NAT. – Marcin Sep 09 '20 at 01:56

1 Answers1

1

In the end the problem was with security groups. I added an existing security group to the AWS Step Function definition and that resolved the problem.

Samuel Neff
  • 73,278
  • 17
  • 138
  • 182