0

I'm experimenting with AWS ECS and fargate, and for starters I wrote a very simple "hello world" application that just puts a file with date and time marked into an S3 bucket, packed it into a docker image and turned the whole into an ECS task.

When I run that task manually from the AWS console, everything is fine. However, what I want in the end is to launch the thing (or several of them) from another application, which is written in kotlin, using the AWS Java SDK.

I can launch the task without errors, and I see it popping up in the ECS console. Only, its status goes from provisioning to pending, and then never turns to running, instead terminating after having hung in Pending state for a while. It should terminate after finishing, just to be clear, so I'm not complaining about it shutting down. But it's definitely an issue that it doesn't do what it's supposed to do before that.

Here's what the code looks like right now that starts the entire thing:

/**
 * Service methods for AWS Elastic Container Service
 */
class FargateService(
    defaultSecurityGroups: List<String>,
    defaultSubnets: List<String>,
    private val client: AmazonECS = AmazonECSClientBuilder.standard().build(),
) {

    private val defaultNetworkConfiguration = NetworkConfiguration().withAwsvpcConfiguration(
        AwsVpcConfiguration().apply {
            setSecurityGroups(defaultSecurityGroups)
            setSubnets(defaultSubnets)
        }
    )

    private val log = logger()

    /**
     * Launches a number of tasks of a certain type.
     * @param taskARN: ARN of the task to launch.
     * @param clusterARN: ARN of the cluster to launch the task in.
     * @param number: How many of this task should be launched in this cluster.
     */
    fun launchTasks(
        taskARN: String,
        clusterARN: String,
        number: Int = 1,
        networkConfiguration: NetworkConfiguration = defaultNetworkConfiguration) {

        val runRequest = RunTaskRequest().apply {
            cluster = clusterARN
            launchType = LaunchType.FARGATE.name
            taskDefinition = taskARN
            count = number
        }.withNetworkConfiguration(networkConfiguration)

        val result = client.runTask(runRequest)
        log.info(result.toString())

    }
}

I'm using the exact same configuration properties as I'm using when launching the thing manually, down to subnet and security group, but the result is not the same. One thing I do not seem to be able to configure here that I have to configure when running it manually is the VPC-id itself, but since I only have one VPC, I'm not too worried about that.

Does anybody have experience with this sort of thing and sees what I'm doing wrong?

UncleBob
  • 1,233
  • 3
  • 15
  • 33
  • 1
    If you toggle to `stopped` tasks in the console you should see all tasks that exited (for proper or improper reasons). If you click on the task that exited because of improper reasons it should tell if there was an issue (and what it was). If the tasks stays in pending for a long time and then fails it is possible that it's being connected to a subnet that doesn't have a route to pull the image if on a public end point (e.g. Docker Hub or ECR not configured with PrivateLink). This is just an example. The console should hint about what it was. – mreferre Mar 09 '21 at 13:14
  • DUH, I hadn't realised, thanks a lot!! I did that, and found out that I was indeed getting a "CannotPullContainerError" as you suggested. After another while of experimenting I found out that this error happens if the launchtype is Fargate and public IP assignement is disabled. Not sure *why* it needs a public IP to pull the container, but here we are... – UncleBob Mar 09 '21 at 13:47
  • 1
    Yeah that is very common. To be clear, it doesn't necessarily need a public IP. It needs a public IP if it lands on a public subnet (simply because without a public IP the IGW won't route it out). You could also deploy the task in a private subnet but in that case you need a NAT GW to route out to the Internet. Glad it's working now. – mreferre Mar 09 '21 at 14:18

1 Answers1

0

The issue was that public IP assignement was disabled in the network configuration. Apparently, if the launch type is Fargate, public IP assignement in the network configuration must be "ENABLED", otherwise you get a "CannotPullContainerError".

It took me a while to figure out, first because I didn't realise how to check error messages during launch (which mreferre helpfully pointed out in a comment above), but also because when I checked the default settings when launching tasks manually from the console, public IP assignement showed DISABLED. It flips to ENABLED automatically once you change the launch type to Fargate, but I hadn't noticed that.

UncleBob
  • 1,233
  • 3
  • 15
  • 33