0

I have a container running in ECS and it's using boto3 to connect to ssm.us-east-2.amazonaws.com. The connection is timing out. The container is using network mode awsvpc and I don't have a NAT Gateway. I thought this wouldn't be a problem since the EC2 instance and the container are both in a public subnet… but I could be wrong. When I ssh into the EC2 instance that's running the container, I'm able to ping the ssm host, but somehow the container can't reach it.

I had a situation last month where a container was relaunching repeatedly and accessing ECR through a NAT Gateway, and the result was terabytes of traffic and a huge bill. I'd really like to avoid using a NAT Gateway if possible.

How do I diagnose the problem here? The app is quitting immediately because it fails to access AWS SSM. Here is the security group for the EC2 instance:

module "sg" {
  source  = "cloudposse/security-group/aws"
  version = "0.4.3"

  # Allow unlimited egress
  allow_all_egress = true

  rules_map = {
    "API" = [{
      type        = "ingress"
      from_port   = 5050
      to_port     = 5050
      protocol    = "tcp"
      cidr_blocks = module.subnets.public_subnet_cidrs
      self        = null
      description = "Allow calling API (HTTP) from IPs in our public subnets (which includes the ALB)"
    }],
    "SSH" = [{
      type        = "ingress"
      from_port   = 22
      to_port     = 22
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
      self        = null
      description = "Allow SSH from all IPs"
    }]
  }

  vpc_id  = module.vpc.vpc_id
  context = module.this.context
}

I'm also using this security group with the ecs-alb-service-task I declared in a previous question. I am not sure whether the problem is with a security group, the networking mode, or something else. The AWS documentation on network modes strongly suggests that awsvpc is the preferred mode, but I still don't really understand the implications or how to pick the right one. I have also tried using the default mode (bridge since I'm on Amazon Linux) and I get the same error.

Old Pro
  • 24,624
  • 7
  • 58
  • 106
Nick K9
  • 3,885
  • 1
  • 29
  • 62
  • 1
    The security group allows all egress, so that isn't your issue. My first guess is the ECS task isn't receiving a public IP address. In your `cloudposse/ecs-alb-service-task/aws` that I saw you using in your previous question, you would need to set `assign_public_ip = true`. – Mark B May 04 '22 at 13:12
  • Thank you for the suggestion. I have tried to enable that, but [it wouldn't accept it](https://github.com/aws/aws-cdk/issues/13348#issuecomment-791061624) with `network_mode = "awsvpc"`. So then I tried changing the `network_mode` to `null` [as specified by the docs](https://registry.terraform.io/modules/cloudposse/ecs-alb-service-task/aws/latest#input_network_mode), but the Apply is still failling with "Network Configuration must be provided when networkMode 'awsvpc' is specified". But I'm not specifying `awsvpc` anymore! – Nick K9 May 04 '22 at 14:44
  • I didn't realize that limitation existed. I've long since switched everything from EC2 deployments to Fargate on the projects I work on. It looks like if you are using EC2 deployments you are stuck using a NAT Gateway, or you need to create a VPC Endpoint for SSM. – Mark B May 04 '22 at 15:00
  • Thanks for the advice. I have switched to Fargate, which TBH wasn't that difficult, and it's working now. – Nick K9 May 04 '22 at 19:37

0 Answers0