29

I'm trying to launch/run a Dockerfile on AWS using their ECS service. I can run my docker image locally just fine, but it's failing on the Fargate launch type. I've uploaded my Docker image to ECR, and I've created a cluster/service/task from it.

However, my cluster's task status simply reads "DEPROVISIONING (Task failed to start)", but it provides no logs or details of the output of my running image, so I have no idea what's wrong. How do I find more information and diagnose why ECS isn't able to run my image?

Cerin
  • 60,957
  • 96
  • 316
  • 522
  • I'm using Fargate, though my knowledge is limited because the deployment pipeline was built for me. We have container `/dev/stdout` going to CloudWatch logging - could this be the case for you also? – halfer May 20 '19 at 23:15
  • As per my experience, troubleshooting is one of the most difficult tasks to do when launching an image using Fargate. I always had to do a hit and trail by checking network setting and image configuration. You can configure CloudWatch in your task definition and see the logs there. For me CloudWatch only created logs after the container was launched at least once, you can still give it a try. – bot May 21 '19 at 15:31

6 Answers6

19

Please go Clusters > Tasks > Details > Containers

You could see some error message around the red rectangle in the figure "error message."

Task detail:

task detail

Error message:

error message

halfer
  • 19,824
  • 17
  • 99
  • 186
Yasu
  • 311
  • 1
  • 5
  • 9
    Yes, I know how to find the task status. That's where I found the text "DEPROVISIONING (Task failed to start)". Unfortunately, that's not a helpful error message. I need to know *why* it failed to start. – Cerin May 21 '19 at 12:46
15

I may be late to the party, but you can check the container logs instead of the tasks'.

Go to the failed task -> Details -> Container (at the bottom) and open it. Right under details you'll see a Status reason.

Opening the container details Opening the container

Getting the reason for failureenter image description here

Note: if your task runs more than one container, check the 'Status reason' of each container as per the screenshot above, as it can be different between them.

alexkb
  • 3,216
  • 2
  • 30
  • 30
Radu Diță
  • 13,476
  • 2
  • 30
  • 34
1

As Abhinav says, the message isn't very descriptive (and using the CLI aws ecs describe-tasks doesn't add anything more). The only possibility is to log into the host EC2 instance and read the logs there, or send those logs to CloudWatch https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html#cwlogs_user_data

The mostly likely cause (in ECS) is that the cluster doesn't have enough resources to launch the new task. You can sometimes work out the cause from the Metrics tab, or since mid-2019 (depending on your region I guess) you can enable "CloudWatch Container Insights" from ECS Account Settings to get more detailed information about memory and CPU reservations.

andrew lorien
  • 2,310
  • 1
  • 24
  • 30
1

None of those methods worked for me. What worked was making just one of the services as essential (only the one you are sure is going to work), and then looking at Cloudwatch logs, and eventually even the ECS logs within the EC2 instance.

# ecs-params.yml

version: 1
task_definition:
  services:
    myservice1:
       essential: true
    myservice2:
       essential: false
    myservice3:
       essential: false
    myservice4:
       essential: false
    myservice5:
       essential: false

ECS's black box is not very friendly after all.

lowercase00
  • 1,917
  • 3
  • 16
  • 36
1

Go to ECS -> Cluster -> Service (select your service name) -> Events

Then click on one of the tasks that failed to start (the long UUID in the list of events): like this:

enter image description here

Make sure to select a task that already failed so that you can see why it failed -- don't select one of the tasks that the ECS Service is still trying to start, and thus hasn't failed yet (remember that ECS will keep trying to start tasks up until the timeout period is over). So, a failed task will look like the following screenshot, and you should see why it failed to start. In my case, for example, this task failed to start because it doesn't have the proper IAM roles:

enter image description here

With those details, you can make the fix (in my case, I just needed to update my ECS Task role to include a bunch of secretsmanager access and kms:Decrypt).

Pierre
  • 2,335
  • 22
  • 40
0

You can get some information regarding the task failure under the 'Events' tab of your service's dashboard. Though the message there aren't very descriptive, they can provide you a vague idea where exactly things are getting wrong.

enter image description here

Abhinav Khare
  • 96
  • 1
  • 7