terraform: Configuring load-balancer to use dynamic port of ECS task/service in AWS

Question

This is sort-of a general question for how dynamic port assignments are supposed to work, though my specific context is trying to figure-out if there is a natural way for a target-group to know the dynamically-assigned port of the service without having to do some manual piping to tell it.

The documentation for ECS dynamic port assignment (https://aws.amazon.com/premiumsupport/knowledge-center/dynamic-port-mapping-ecs) states that you just have to set the host-port to (0) in the task-definition, no port needs to be specifically provided to the target group, and implies that it should just magically work. I've tried this before, and I couldn't get things to talk. I can't specifically remember where the breakdown was.

Now I'm trying to use Terraform to do it, and my issue is that, yes, I can set the task-definition to have a port of (0) but the port argument in the target-group resource is required to be present and non-zero. So, how is the other side of the dynamic port assignment supposed to work? I'm assuming that AWS solves the whole problem. Or, is it just that the dynamic port assignment just comes up with the port-assignment half but that automation is required to provide that port to the other side, and AWS doesn't have a mechanism to do this for you? It seems like an obvious question that, for some reason, no one has posted any documentation/discussion for. I could use some clarification.

I'm specifically using an ALB (application load balancer) but it may not matter.

Thank you.

DarkSideGeek · Answer 1 · 2022-10-11T03:41:00.167

Keep in mind that I arrived here with a question of my own (which you will soon see), so I may not be the best to adequately answer yours, but... The short version is "it just magically works".

When you Terraform the load balanced Service (which references a TaskDef), you have to attach an ALB. Attaching this ALB requires a container name and port:

resource "aws_ecs_service" "ecs_service" {
  cluster = aws_ecs_cluster.ecs_cluster.id
  iam_role = aws_iam_role.ecs_service_role.arn
  launch_type = "EC2"
  task_definition = "${aws_ecs_task_definition.ecs_container.family}:${aws_ecs_task_definition.ecs_container.revision}"
  propagate_tags = "SERVICE"
  enable_ecs_managed_tags = true
  health_check_grace_period_seconds = 120
  desired_count = 2
  deployment_minimum_healthy_percent = 50
  deployment_maximum_percent = 200
  force_new_deployment = true
  load_balancer {
    target_group_arn = aws_lb_target_group.ecs_front_end_targetgroup.arn
    container_name = "FOO"
    container_port = ?????
  }
}

In the TaskDef, your container has a port (eg 8080), but the hostport is set to 0 so that you get a random port assignment on the Instance. The ECS Agent will automatically handle the updating of the Targets in the Target Group for you.

However, when the Service with the 'load_balancer' clause is being instantiated, you technically don't know any of the "ephemeral ports" (eg the mapped high-ports) with which to create the ALB's Target Group framework. Those ports haven't been assigned yet, and won't exist until the first Task is created. You can't use 0 because it wants a container port, not an Instance port.

The solution is to use the literal container port (eg 8080) here. This technique works. The Service is instantiated and creates a Target Group with Instances referencing the unmapped port 8080. Later, the ECS Agent comes along and, as Tasks are created, backfills with other Targets using the working ephemeral ports.

The only weird thing is in creating one Target per Instance pointing to the unmapped 8080, which is in a continual state of unhealthiness for obvious reasons. There is no cleanup action. The other Targets are fine and so the unhealthy are ignored. They also do not factor into the Desired count for AutoScaling. But I'd love to know if there were a way to clean these up in automation.

I can manually unregister each failed unmapped Target, but this is a hassle.
Under the hood, what I've found this Terraform does is associate the ALB with both the Service and the AutoScale group. So I can also go to AutoScale and detach the ALB -- leaving the ALB on Service intact -- which kills all unhealthy unmapped Targets in one fell swoop.

Most of the time, though, I just leave them as-is...

Thanks for sharing. Accepting the presence of magic **and** one wasted instance for every deployment is going to be an irritation on principle. I'd especially be concerned about that one instance being a red herring that we lose time investigating, over the long turn, every time we forget about it :) . — Dustin Oprea, Oct 08 '22 at 21:10
I think you misunderstood. There is no "wasted instance". If you want 4 instances in your ECS cluster, you'll get and use 4 instances. No waste. The irritation is that each instance initializes a Target Group target whose healthcheck references the unmapped (non-ephemeral) Instance port which, although has no hope of ever getting healthy, is also rather benign. Which is why you can ignore them (unless you are OCD like me). The code above is good, not magic. The magic is in the semi-undocumented behavior under-the-hood. And if the explanation is good, please mark as answered. — DarkSideGeek, Oct 11 '22 at 03:31
Got it. I voted it up, but I'll reserve the accepted answer for some theoretical answer, possibly not existing until a long time from now, that solves the issue in a savor way. Still, your method/observation is a useful contingency if the behavior is required in any form. — Dustin Oprea, Oct 12 '22 at 07:50

terraform: Configuring load-balancer to use dynamic port of ECS task/service in AWS

1 Answers1