0

tldr;

I can't access to my service through the ALB DNS name. Trying to reach the URL will timeout.

I noticed that from IGW and Nate there's an isolated routed subnet (Public Subnet 2) and also a task that's not being exposed through the ALB because somehow it got a different attached subnet.

enter image description here

More general context

Got Terraform modules defining

  • ECS cluster, service and task definition
  • ALB setup, including a target group and a listener
  • Got a couple subnets and a security group for ALB
  • Got private subnets and own sg for ECS
  • Target group port is the same as container port already

Using CodePipeline a get a task running, I can see logs of my service meaning it starts.

Some questions

  • Can I have multiple IGW associated to a single NAT within a single VPC?
  • Tasks get attached a couple private subnets and a sg with permissions to the alb sg. Also, tasks should access a Redis instance so I'm also attaching to them a SG and a subnet where Elastic Cache node lives (shown in the terraform module below). Any advise here?

ALB and networking resources

variable "vpc_id" {
  type = string
  default = "vpc-0af6233d57f7a6e1b"
}

variable "environment" {
  type = string
  default = "dev"
}

data "aws_vpc" "vpc" {
  id = var.vpc_id
}


### Public subnets
resource "aws_subnet" "public_subnet_us_east_1a" {
  vpc_id                  = data.aws_vpc.vpc.id
  cidr_block              = "10.0.10.0/24"
  map_public_ip_on_launch = true
  availability_zone       = "us-east-1a"

  tags = {
    Name = "audible-blog-us-${var.environment}-public-subnet-1a"
  }
}

resource "aws_subnet" "public_subnet_us_east_1b" {
  vpc_id                  = data.aws_vpc.vpc.id
  cidr_block              = "10.0.11.0/24"
  availability_zone       = "us-east-1b"
  map_public_ip_on_launch = true

  tags = {
    Name = "audible-blog-us-${var.environment}-public-subnet-1b"
  }
}


### Private subnets

resource "aws_subnet" "private_subnet_us_east_1a" {
  vpc_id                  = data.aws_vpc.vpc.id
  cidr_block              = "10.0.12.0/24"
  map_public_ip_on_launch = true
  availability_zone       = "us-east-1a"

  tags = {
    Name = "audible-blog-us-${var.environment}-private-subnet-1a"
  }
}

resource "aws_subnet" "private_subnet_us_east_1b" {
  vpc_id                  = data.aws_vpc.vpc.id
  cidr_block              = "10.0.13.0/24"
  availability_zone       = "us-east-1b"
  tags = {
    Name = "audible-blog-us-${var.environment}-private-subnet-1b"
  }
}

# Create a NAT gateway with an EIP for each private subnet to get internet connectivity
resource "aws_eip" "gw_a" {
  vpc        = true
}

resource "aws_eip" "gw_b" {
  vpc        = true
}
resource "aws_nat_gateway" "gw_a" {
  subnet_id     = aws_subnet.public_subnet_us_east_1a.id
  allocation_id = aws_eip.gw_a.id
}
resource "aws_nat_gateway" "gw_b" {
  subnet_id     = aws_subnet.public_subnet_us_east_1b.id
  allocation_id = aws_eip.gw_b.id
}

# Create a new route table for the private subnets
# And make it route non-local traffic through the NAT gateway to the internet
resource "aws_route_table" "private_a" {
  vpc_id = data.aws_vpc.vpc.id
  route {
    cidr_block = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.gw_a.id
  }
}

resource "aws_route_table" "private_b" {
  vpc_id = data.aws_vpc.vpc.id
  route {
    cidr_block = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.gw_b.id
  }
}
# Explicitely associate the newly created route tables to the private subnets (so they don't default to the main route table)
resource "aws_route_table_association" "private_a" {
  subnet_id      = aws_subnet.private_subnet_us_east_1a.id
  route_table_id = aws_route_table.private_a.id
}

resource "aws_route_table_association" "private_b" {
  subnet_id      = aws_subnet.private_subnet_us_east_1b.id
  route_table_id = aws_route_table.private_b.id
}


# This is the group you need to edit if you want to restrict access to your application
resource "aws_security_group" "alb_sg" {
  name        = "audible-blog-us-${var.environment}-lb-sg"
  description = "Internet to ALB Security Group"
  vpc_id      = data.aws_vpc.vpc.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags = {
    name = "audible-blog-us-${var.environment}-lb-sg"
  }
}

# Traffic to the ECS Cluster should only come from the ALB
resource "aws_security_group" "ecs_tasks_sg" {
  name        = "audible-blog-us-${var.environment}-ecs-sg"
  description = "ALB to ECS Security Group"
  vpc_id      = data.aws_vpc.vpc.id

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    security_groups = [ aws_security_group.alb_sg.id ]
  }
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags = {
    name = "audible-blog-us-${var.environment}-ecs-sg"
  }
}

resource "aws_alb" "alb" {
  name                = "audible-blog-us-${var.environment}-alb"
  internal            = false
  load_balancer_type  = "application"
  subnets             = [ aws_subnet.public_subnet_us_east_1a.id,  aws_subnet.public_subnet_us_east_1b.id ]
  security_groups     = [ aws_security_group.alb_sg.id ]

  tags = {
    name        = "audible-blog-us-${var.environment}-alb"
    environment = var.environment
  }
}

resource "aws_alb_target_group" "target_group" {
  name        = "audible-blog-us-${var.environment}-target-group"
  port        = "8080"
  protocol    = "HTTP"
  vpc_id      = data.aws_vpc.vpc.id
  target_type = "ip"

  health_check {
    enabled = true
    path = "/blog"
    interval = 30
    matcher = "200-304"
    port = "traffic-port"
    unhealthy_threshold = 5
  }

  depends_on = [aws_alb.alb]
}



resource "aws_alb_listener" "web_app_http" {
  load_balancer_arn = aws_alb.alb.arn
  port              = 80
  protocol          = "HTTP"
  depends_on        = [aws_alb_target_group.target_group]

  default_action {
    target_group_arn = aws_alb_target_group.target_group.arn
    type             = "forward"
  }
}

output "networking_details" {
  value = {
    load_balancer_arn = aws_alb.alb.arn
    load_balancer_target_group_arn = aws_alb_target_group.target_group.arn
    subnets = [
      aws_subnet.private_subnet_us_east_1a.id,
      aws_subnet.private_subnet_us_east_1b.id
    ]
    security_group = aws_security_group.ecs_tasks_sg.id
  }
}

ECS Fargate module

module "permissions" {
  source = "./permissions"
  environment = var.environment
}

resource "aws_ecs_cluster" "cluster" {
  name = "adl-blog-us-${var.environment}"
}

resource "aws_cloudwatch_log_group" "logs_group" {
  name = "/ecs/adl-blog-us-next-${var.environment}"
  retention_in_days = 90
}

resource "aws_ecs_task_definition" "task" {
  family                  = "adl-blog-us-task-${var.environment}"
  container_definitions   = jsonencode([
    {
      name      = "adl-blog-us-next"
      image     = "536299334720.dkr.ecr.us-east-1.amazonaws.com/adl-blog-us:latest"
      portMappings = [
        {
          containerPort = 8080
          hostPort      = 8080
        },
        {
          containerPort = 6379
          hostPort      = 6379
        }
      ]
     environment: [
        {
            "name": "ECS_TASK_FAMILY",
            "value": "adl-blog-us-task-${var.environment}"
        }
      ],
      logConfiguration: {
        logDriver: "awslogs",
        options: {
            awslogs-group: "/ecs/adl-blog-us-next-${var.environment}",
            awslogs-region: "us-east-1",
            awslogs-stream-prefix: "ecs"
        }
      },
      healthCheck: {
        retries: 3,
        command: [
            "CMD-SHELL",
            "curl -sf http://localhost:8080/blog || exit 1"
        ],
        timeout: 5,
        interval: 30,
        startPeriod: null
      }
    }
  ])
  cpu       = 256
  memory    = 512
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  execution_role_arn = module.permissions.task_definition_execution_role_arn
  task_role_arn = module.permissions.task_definition_execution_role_arn
}

resource "aws_ecs_service" "service" {
  name            = "adl-blog-us-task-service-${var.environment}"
  cluster         = aws_ecs_cluster.cluster.id
  deployment_controller {
    type = "ECS"
  }
  deployment_maximum_percent         = 200
  deployment_minimum_healthy_percent = 50
  task_definition = aws_ecs_task_definition.task.family
  desired_count   = 3
  launch_type = "FARGATE"
  network_configuration {
    subnets =           concat(
      var.public_alb_networking_details.subnets,
      [ var.private_networking_details.subnet.id ]
    )
    security_groups =   [
      var.public_alb_networking_details.security_group,
      var.private_networking_details.security_group.id
    ]
    assign_public_ip =  true
  }
  load_balancer {
    target_group_arn = var.public_alb_networking_details.load_balancer_target_group_arn
    container_name   = "adl-blog-us-next"
    container_port   = 8080
  }

  force_new_deployment = true
  lifecycle {
    ignore_changes = [desired_count]
  }

  depends_on = [
    module.permissions
  ]
}

variable "private_networking_details" {}
variable "public_alb_networking_details" {}
variable "environment" {
  type = string
}
diegoaguilar
  • 8,179
  • 14
  • 80
  • 129
  • What are SGs for the fargate? Also can you access your app on fargate without ALB? in other words, can you confirm that the issue is only with ALB, and your app actually works without it? – Marcin Jul 26 '21 at 03:57
  • @Marcin I've just declared the SG that is shown on the terraform module and that is assigned to ALB. My heads up that Fargate task is working is that I can see the logs through CloudWatch and that the alb target group health check succeeds – diegoaguilar Jul 26 '21 at 04:01
  • The VPC that you use, is it custom one? I see you are creating some routes, but what about the rest of your VPC? Internet gateway? Routes for internet gateway? – Marcin Jul 26 '21 at 07:02
  • It isn't a custom one. I used one created after a CloudFormation template for ECS. So far only reusing the VPC and IG. Actually, I've updated the alb and vpc resources module. Check out my update. I've got private subnets for ECS and public for ALB. You can see my edits – diegoaguilar Jul 26 '21 at 07:10

1 Answers1

1

Your container ports are 8080 and 6379. However your target group says its 80. So you have to double check what are your actual ports that you use on Fargate and adjust your TG accordingly.

There could be other issues as well, which aren't yet apparent. For example, you are opening port 443, but there is no listener for that. So any attempt of using https will fail.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • Right .... Well the application is exposed through port 8080. Regarding health check, I'm a bit confused if that should be fixed to 8080 or "traffic-port" is OK – diegoaguilar Jul 26 '21 at 07:48
  • @diegoaguilar Your TG port is the traffic-port, which should be 8080. Can you ceck your CFN template, as you wrote you have same setup there? All ports should be correct in CFN if it works. – Marcin Jul 26 '21 at 07:50
  • I've changed TG port into 8080 but still can't connect. I can tell that ECS logs show that app bootstraps successfully though – diegoaguilar Jul 26 '21 at 08:32
  • @diegoaguilar I don't know at the moment. If you have working CFN code, you can also show it. Maybe it will be easier to spot differences. – Marcin Jul 26 '21 at 08:44
  • I originally tried to reuse networking VPC and dependencies resources from a template that this wizard uses https://aws.amazon.com/blogs/compute/building-deploying-and-operating-containerized-applications-with-aws-fargate/ However I've modified enough so I think it's not even relevant – diegoaguilar Jul 26 '21 at 21:22
  • what's relevant it's what I just discovereed. There might be a conflicting subnet. Look at my edits – diegoaguilar Jul 26 '21 at 21:23