21

I'm starting a task in ECS using Fargate and after being in PENDING for a little bit it ends up in STOPPED with the following error:

STOPPED (CannotPullContainerError: "Error response from daem

When I expand out the details I see

STOPPED (CannotPullContainerError: "Error response from daemon: Get https://id.dkr.ecr.ap-southeast-2.amazonaws.com/v2/: net/http: request canceled while waiting for connection"

with the reason

(Client.Timeout exceeded while awaiting headers)

So the task can't access the container for some reason, but I'm not sure what permission is missing and from what resource. I've read around a bit and the only real suggestion I've found is to add AssignPublicIp: ENABLED to the AwsvpcConfiguration but that didn't help.

tschumann
  • 2,776
  • 3
  • 26
  • 42

8 Answers8

14

I managed to fix this error by enabling the public IP for every Fargate instance created with my service on ECS.

Service configuration:

{
  ...
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": [
        "my-subnets",
      ],
      "securityGroups": [
        "my-security-group"
      ],
      "assignPublicIp": "ENABLED" // <-- ENABLED HERE
    }
  },
}
gbalduzzi
  • 9,356
  • 28
  • 58
  • 3
    I agree that this should work but due to security policies I wasn't able to assign public IP addresses to the Fargate instances. – tschumann Jan 23 '20 at 22:29
  • This workaround worked for me. Can you explain why? FYI - I am running fargate instances. – leeman24 Oct 21 '20 at 19:12
9

I found a solution that worked for me using Fargate. Their documentations states:

  1. If you're running a task using an Amazon Elastic Compute Cloud (Amazon EC2) launch type and your container instance is in a private subnet, or if you're running a task using the AWS Fargate launch type in a private subnet, confirm that your subnet has a route to a NAT gateway in the route table.

That simply means,

  1. you have to find VPC that you use.
  2. Then in the table where you have your VPC, you will find its main route table.
  3. Open the route table, then make sure you have entry that link to internet gateway.

It would look something like igw-006b1917dc348d10d. Once you have, your vpc will have access to the internet, and will be able to fetch your ECR image.

table example

Source: AWS docs

Alan Sereb
  • 2,358
  • 2
  • 17
  • 31
3

So it looks the error message has changed at some point: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/ has steps to work through but mentions the error CannotPullContainerError: API error which might be synonymous with CannotPullContainerError: "Error response from daem?

For me at least, creating an AWS::EC2::VPCEndpoint seems to have got me further.

tschumann
  • 2,776
  • 3
  • 26
  • 42
2

This is an error comes when you are not able to pull the image..it may have many reasons like permission and Internet access inside the VPC.

If your VPC is Public only subnet then you need to add Internet gateway for Internet access. If your VPC is Private only then you need a NAT Gateway so that task can reach to docker image to pull.

Vaseem007
  • 2,401
  • 22
  • 20
1

Alan Sereb's solution worked for me.

It seems after AWS launched Fargate platform version 1.4.0, access to remote image registry (like Gitlab Registry in my case) is done using ECS Service configured VPC.

So now the Fargate container network interfaces (and therefore the VPC used by ECS) need to have internet access, so setting up Internet Gateway in the VPC Routing Table is mandatory.

Marino B
  • 31
  • 1
1

The reason is that the service that is running the task definition it is not connected to the internet.

I had it because my vpc was in a public subnet and the service didn't have a public IP address.

Building on top of this answer, if you are using the python cdk to create your service, you can specify if the tasks within the service should use public IP address as well as the subnet and the security groups while creating the service.

Basically, you should have something like this.. :

service = ecs.FargateService(self,
                             "service-name",
                             cluster=cluster,
                             task_definition=task_definition,
                             service_name="service-name", 
                             assign_public_ip=True, # this is important
                             security_groups=[list of security groups , also important],
                             vpc_subnets=[list of subnets]
                             )

For more info about the FargateService, refer to this

If you are using the cli, you can update your service with the following command:

aws ecs update-service --service service-name --cluster the_Cluster  --network-configuration "{
    \"awsvpcConfiguration\": {
      \"subnets\": [\"subnet-***\",\"subnet-****\",\"subnet-*****\"],
      \"securityGroups\": [\"sg-******\"],
      \"assignPublicIp\": \"ENABLED\"
    }

For more information on how to update a service check this

jtlz2
  • 7,700
  • 9
  • 64
  • 114
Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73
  • I think `vpc_subnets=` needs to be `ec2.SubnetSelection(subnets=[...])` rather than a pure list `[...]` – jtlz2 Jun 23 '22 at 13:58
0

To pull images ECS with Fargate uses a task execution role (e.g: ecsTaskExecutionRole) that must have the policy AmazonECSTaskExecutionRolePolicy.

When pulling images from private repositories outside ECR, this task execution role will need authentication in the remote container registry, so as the AWS documentation states https://docs.aws.amazon.com/AmazonECS/latest/developerguide/private-auth.html it's required a secret with the credentials, and, for the task execution role to access the secret, also an inline policy secretsmanager:GetSecretValue.

Assuming the image is publicly accessible in any container registry (DockerHub, ECR, GitLab, etc.) there could be other things involved.

  1. Make sure your VPC has DNS resolution set to Enable or it won't reach external URLs
  2. Make sure your subnets, where the Fargate service operates, have access to internet. If they are public, the subnets will have a route table redirecting traffic to any IP (0.0.0.0/0) towards the Internet Gateway. Otherwise, they'll have to use a NAT Gateway as a jump service to access internet.
  3. Make sure your NACL at the subnet level and the security groups being used allow outgoing and incoming traffic.

As a side note, there is a service under VPC called Reachability Analyzer that will allow you check connectivity path and detect any error in NACL or routing table. For example you can validate that a network interface within any subnet has access to the internet gateway. It works as a trace route.

jtlz2
  • 7,700
  • 9
  • 64
  • 114
diegosasw
  • 13,734
  • 16
  • 95
  • 159
0

I also got the same issue. After investigate, I see when disable the Auto-assign public IP We must connect your service throught a Private Subnet associate with NAT Gateway to the public internet.

Step by steps here:

1. Create 2 - 3 private subnets

enter image description here

2. Create new Route Table and associate with those Subnets enter image description here

3. Create a NAT Gateway enter image description here

-- Assign one of your private subnets enter image description here

-- Create the Service with the above Subnets enter image description here

Phat Tran
  • 3,404
  • 1
  • 19
  • 22