7

I'm trying to set up a couple of services with ECS Fargate, provisioned via Terraform. They use the same module, only image, ALB target group, environment variables and port mappings differ.

2 out 3 services start their tasks successfully only one (unfortunately the main service), doesn't want to start and shows Network bindings - not configured for the container. The port I'm using is 80.

The task definition has the correct port mappings.

I've tried changing the port (to 8080), use multiple port mappings and recreating the service multiple times to no effect.

Of course the task gets killed by the load balancer for failing health checks.

Any pointers what could be wrong? I found some Github issues regarding this from 2017, but on EC2-backed ECS instances, which has been claimed to be fixed.

For reference, here's the task definition JSON:

{
  "ipcMode": null,
  "executionRoleArn": "ROLE_ARN",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "/drone",
          "awslogs-region": "eu-central-1",
          "awslogs-stream-prefix": "drone-server/"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 256,
      "environment": [...],
      "resourceRequirements": null,
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "secrets": [...],
      "dockerSecurityOptions": null,
      "memory": 512,
      "memoryReservation": 512,
      "volumesFrom": [],
      "stopTimeout": 30,
      "image": "drone/drone:1",
      "startTimeout": null,
      "dependsOn": null,
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": false,
      "dockerLabels": null,
      "systemControls": null,
      "privileged": null,
      "name": "drone-server"
    }
  ],
  "placementConstraints": [],
  "memory": "512",
  "taskRoleArn": "ROLE_ARN",
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "taskDefinitionArn": "TASK_DEFINITION_ARN",
  "family": "drone-server",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.container-ordering"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.secrets.ssm.environment-variables"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    }
  ],
  "pidMode": null,
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "256",
  "revision": 14,
  "status": "ACTIVE",
  "proxyConfiguration": null,
  "volumes": []
}```

2 Answers2

3

With ECS on EC2, your container port (like 80) is mapped to a dynamic port on the host (like 35467) and then registers this port with the TargetGroup with type 'instance'. (Technically, this happens if you send a zero as the host port mapped to port 80 on the container. AWS takes this as 'dynamically assign a port on the host')

The big difference in Fargate is it uses ENIs attached to task for networking and each task gets its own private IP address (can be public if you want as well).

Then, with that unique IP address (as opposed to instance-unique port) it registers the unique IP address with port 80 to the TargetGroup with type 'ip'.

So two things could be going wrong... first of all, on Fargate, your task must have the same host port and container port (e.g. 80:80), and you must be sure it's registering to the TargetGroup with type 'ip'.

I am not a terraform user, so not sure how much of that is in your control, but I suspect one of those two things is not right and causing your web service/task to not launch correctly.

For reference, here's the task definition JSON:

{
  "ipcMode": null,
  "executionRoleArn": "ROLE_ARN",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "/drone",
          "awslogs-region": "eu-central-1",
          "awslogs-stream-prefix": "drone-server/"
        }
      },
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 256,
      "environment": [...],
      "resourceRequirements": null,
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "secrets": [...],
      "dockerSecurityOptions": null,
      "memory": 512,
      "memoryReservation": 512,
      "volumesFrom": [],
      "stopTimeout": 30,
      "image": "drone/drone:1",
      "startTimeout": null,
      "dependsOn": null,
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": false,
      "dockerLabels": null,
      "systemControls": null,
      "privileged": null,
      "name": "drone-server"
    }
  ],
  "placementConstraints": [],
  "memory": "512",
  "taskRoleArn": "ROLE_ARN",
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "taskDefinitionArn": "TASK_DEFINITION_ARN",
  "family": "drone-server",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.container-ordering"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.secrets.ssm.environment-variables"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    }
  ],
  "pidMode": null,
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "256",
  "revision": 14,
  "status": "ACTIVE",
  "proxyConfiguration": null,
  "volumes": []
}```

Brett Green
  • 3,535
  • 1
  • 22
  • 29
  • I added the Terraform comment, to show, that the configuration itself is working (on 2 services). Target group is of type IP and ports are all mapped correctly – Markus Mühlberger Aug 09 '19 at 15:31
  • 1
    Must be internal to the container then... is it actually running internally and bound to something like '0.0.0.0' so it can attach to whatever IP address has been assigned to it? – Brett Green Aug 09 '19 at 15:34
  • The container I'm running is `drone/drone:1`, something that (until today) was run on EC2-backed ECS without any issues. (`drone/autoscaler` and `drone/amazon-secrets` are the successfully running containers). – Markus Mühlberger Aug 09 '19 at 15:54
  • 1
    Yeah, I am assuming this last one is running a web server which is not binding to the dynamically assigned IP address. I think it likely that your ECS task/service definitions are fine now, but you need to make a change to how the web server starts up on the container to bind to 0.0.0.0 (which means bind to any IP address). It may have previously been bound to localhost or 127.0.0.1 which would work in EC2, but not on Fargate. Here's a similar question that uses rails: https://stackoverflow.com/questions/29083885/what-does-binding-a-rails-server-to-0-0-0-0-buy-you – Brett Green Aug 09 '19 at 16:10
  • 1
    Thanks @BrettGreen your comment was the key thing that helped me solve my issue! – Daniel Corbett Dec 03 '21 at 22:35
0

Apparently Fargate is not very good at reporting errors or displaying state. It doesn't show all the environment variables or the correct status in the AWS console, but somehow works anyways.

Morale of the story is, if something doesn't show up in the console, make sure to test if it does actually not work.

I honestly can't tell a solution to my issues, since when I turned on trace logging on the Drone CI server via an environment variable, it went away.

  • 1
    Hmmmm... one problem I had when switching to Fargate is that tasks take a little longer to startup. In some cases, this may cause ELB HealthChecks to prematurely consider the task unhealthy. This would be an intermittent problem. Setting the 'Health Check Grace Period' on the service and/or changing the healthcheck frequency settings could alleviate the problem if it occurs again. – Brett Green Aug 16 '19 at 15:35