How to diagnose ECS Fargate task failing to start?

Question

I'm trying to launch/run a Dockerfile on AWS using their ECS service. I can run my docker image locally just fine, but it's failing on the Fargate launch type. I've uploaded my Docker image to ECR, and I've created a cluster/service/task from it.

However, my cluster's task status simply reads "DEPROVISIONING (Task failed to start)", but it provides no logs or details of the output of my running image, so I have no idea what's wrong. How do I find more information and diagnose why ECS isn't able to run my image?

I'm using Fargate, though my knowledge is limited because the deployment pipeline was built for me. We have container `/dev/stdout` going to CloudWatch logging - could this be the case for you also? — halfer, May 20 '19 at 23:15
As per my experience, troubleshooting is one of the most difficult tasks to do when launching an image using Fargate. I always had to do a hit and trail by checking network setting and image configuration. You can configure CloudWatch in your task definition and see the logs there. For me CloudWatch only created logs after the container was launched at least once, you can still give it a try. — bot, May 21 '19 at 15:31

score 19 · Answer 1 · edited May 21 '19 at 12:13

19

Please go Clusters > Tasks > Details > Containers

You could see some error message around the red rectangle in the figure "error message."

Task detail:

task detail

Error message:

error message

edited May 21 '19 at 12:13

halfer

19,824
17
99
186

answered May 21 '19 at 06:45

Yasu

311
1
5

9

Yes, I know how to find the task status. That's where I found the text "DEPROVISIONING (Task failed to start)". Unfortunately, that's not a helpful error message. I need to know *why* it failed to start. – Cerin May 21 '19 at 12:46

score 15 · Answer 2 · edited Jan 04 '21 at 07:43

15

I may be late to the party, but you can check the container logs instead of the tasks'.

Go to the failed task -> Details -> Container (at the bottom) and open it. Right under details you'll see a Status reason.

Opening the container details

Getting the reason for failure

Note: if your task runs more than one container, check the 'Status reason' of each container as per the screenshot above, as it can be different between them.

edited Jan 04 '21 at 07:43

alexkb

3,216
2
30
30

answered Mar 08 '20 at 11:35

Radu Diță

13,476
2
30
34

Looks like the day might be saved. Thank you. – Marek Puchalski Jul 09 '20 at 14:34

score 1 · Answer 3 · answered Jul 30 '19 at 04:32

As Abhinav says, the message isn't very descriptive (and using the CLI aws ecs describe-tasks doesn't add anything more). The only possibility is to log into the host EC2 instance and read the logs there, or send those logs to CloudWatch https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html#cwlogs_user_data

The mostly likely cause (in ECS) is that the cluster doesn't have enough resources to launch the new task. You can sometimes work out the cause from the Metrics tab, or since mid-2019 (depending on your region I guess) you can enable "CloudWatch Container Insights" from ECS Account Settings to get more detailed information about memory and CPU reservations.

lowercase00 · Answer 4 · 2021-02-12T14:47:11.983

None of those methods worked for me. What worked was making just one of the services as essential (only the one you are sure is going to work), and then looking at Cloudwatch logs, and eventually even the ECS logs within the EC2 instance.

# ecs-params.yml

version: 1
task_definition:
  services:
    myservice1:
       essential: true
    myservice2:
       essential: false
    myservice3:
       essential: false
    myservice4:
       essential: false
    myservice5:
       essential: false

ECS's black box is not very friendly after all.

score 1 · Answer 5 · answered Nov 09 '21 at 15:37

Go to ECS -> Cluster -> Service (select your service name) -> Events

Then click on one of the tasks that failed to start (the long UUID in the list of events): like this:

Make sure to select a task that already failed so that you can see why it failed -- don't select one of the tasks that the ECS Service is still trying to start, and thus hasn't failed yet (remember that ECS will keep trying to start tasks up until the timeout period is over). So, a failed task will look like the following screenshot, and you should see why it failed to start. In my case, for example, this task failed to start because it doesn't have the proper IAM roles:

With those details, you can make the fix (in my case, I just needed to update my ECS Task role to include a bunch of secretsmanager access and kms:Decrypt).

score 0 · Answer 6 · answered May 21 '19 at 17:59

0

You can get some information regarding the task failure under the 'Events' tab of your service's dashboard. Though the message there aren't very descriptive, they can provide you a vague idea where exactly things are getting wrong.

answered May 21 '19 at 17:59

Abhinav Khare

96
1
7

2

Logs are totally empty for me :( – Martynas Jusevičius Aug 09 '19 at 08:59
I get it reaches steady state but still fails with exit code 7 (which I can find no information on). – swade Apr 09 '20 at 21:52

How to diagnose ECS Fargate task failing to start?

6 Answers6

Linked