6

I am starting an ECS task with Fargate and the container ends up in a STOPPED state after being in PENDING for a few minutes. The Status gives the following error message:

CannotPullContainerError: context canceled

I am using PrivateLink to allow the ECS host to talk to the ECR registry without having to go via the public Internet and this is how it is configured (Serverless syntax augmenting CloudFormation):

      Properties:
        PrivateDnsEnabled: true
        ServiceName: com.amazonaws.ap-southeast-2.ecr.dkr
        SubnetIds:
          - { Ref: frontendSubnet1 }
          - { Ref: frontendSubnet2 }
        VpcEndpointType: Interface
        VpcId: { Ref: frontendVpc }

Any ideas as to what is causing the error?

tschumann
  • 2,776
  • 3
  • 26
  • 42
  • its because of routing table, check this issue https://github.com/aws/amazon-ecs-agent/issues/1266 – Adiii Nov 04 '19 at 07:11
  • @Adiii I'm not sure that it is - https://aws.amazon.com/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/ says that PrivateLink is an alternative to stuffing around with routing and using PrivateLink cleared that error for me. – tschumann Nov 04 '19 at 22:33

2 Answers2

4

did you also add an S3 endpoint? Here is a working snippet of my template, I was able to solve the issue with the aws support:

  EcrDkrEndpoint:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
  PrivateDnsEnabled: true
  SecurityGroupIds: [!Ref 'FargateContainerSecurityGroup']
  ServiceName: !Sub 'com.amazonaws.${AWS::Region}.ecr.dkr'
  SubnetIds: [!Ref 'PrivateSubnetOne', !Ref 'PrivateSubnetTwo']
  VpcEndpointType: Interface
  VpcId: !Ref 'VPC'

For S3 you need to know that a route table is necessary - normally you would like to use the same as for the internet gateway, containing the route 0.0.0.0/0

  S3Endpoint:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
  ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
  VpcEndpointType: Gateway
  VpcId: !Ref 'VPC'
  RouteTableIds: [!Ref 'PrivateRouteTable'] 

Without an endpoint for cloudwatch you will get another failure, it is necessary too:

  CloudWatchEndpoint:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
  PrivateDnsEnabled: true
  SecurityGroupIds: [!Ref 'FargateContainerSecurityGroup']
  ServiceName: !Sub 'com.amazonaws.${AWS::Region}.logs'
  SubnetIds: [!Ref 'PrivateSubnetOne', !Ref 'PrivateSubnetTwo']
  VpcEndpointType: Interface
  VpcId: !Ref 'VPC'

EDIT: private route table:

  PrivateRoute:
Type: AWS::EC2::Route
DependsOn: InternetGatewayAttachement
Properties:
  RouteTableId: !Ref 'PublicRouteTable'
  DestinationCidrBlock: '0.0.0.0/0'
  GatewayId: !Ref 'InternetGateway'
graphik_
  • 41
  • 2
  • I did add a VPCEndpoint for S3 too but I didn't add one for CloudWatch - what error did you get without the CloudWatch VPCEndpoint? I talked to my internal AWS team and they thought what I had should work so we decided to take a different route (give containers public IP addresses and put them in a security group) which worked. – tschumann Nov 13 '19 at 22:29
  • 1
    Did your S3 VPCEndpoint contain the parameter "RouteTableIds" this is necessary even if the documentation says it is optional. After adding the route table the error "CannotPullContainerError: context canceled" was solved for me. I got this error without CloudWatch Endpoint: "DockerTimeoutError: Could not transition to started; timed out after waiting 3m0s" The support told me the following error could be rise up too: "CannotStartContainerError: Error response from daemon: failed to initialize logging driver" – graphik_ Nov 14 '19 at 08:40
  • No I didn't - I trusted the AWS documentation (because there is too much documentation to question it). What needs to be in the private route table? – tschumann Nov 14 '19 at 22:17
  • 1
    I trusted the documentation too and was really surprised as the support told me that the parameter is necessary. (I hope they will add a hint to the documentation.) The private route table only needs a default route to your internet gateway, I added the cloudformation snippet for my route to my original answer. – graphik_ Nov 15 '19 at 10:50
  • 1
    I received a similar error on a purely isolated VPC (with only internal VPC endpoints for S3 and ECR). Since we didn't have an internet gateway, the above didn't help. In our case it turned out to be related to the security group for the task. It was configured to only allow access to a private VPC subnet, and thus didn't have outbound access to the S3 bucket where the docker layers were stored. After consulting with AWS support, they advised us to add a rule allowing https/tcp-443 to 0.0.0.0/0 egress. This resolved the issue. – oskarpearson Dec 21 '19 at 16:24
  • the 'private route table:' section fixed it for me. thanks. was struggling with this for few hours – wildthing81 Nov 17 '20 at 09:06
1

I found I needed not only vpc endpoints for s3, aws logs and the two ecr endpoints as detailed in @graphik_ 's answer but I also needed to ensure that the security groups on the endpoints allowed ingress access to HTTPS from the security group on the Farscape containers.

The security group on the Farscape containers need egress access via HTTPS to the vpce endpoint security group and also to the pl-7ba54012 IP group which is s3.

This and the route to pl-7ba54012 in the route table seems to be the whole picture.

There are Policies on the vpce too, which I left as the default "All Access" but you could harden these up to only allow access from the Role running the Fargate containers.

M. Day
  • 11
  • 1