Just to give some background about this error - what does Resource is not in the state stackUpdateComplete
actually mean?
Well basically Amplify is telling you that one of the stacks in your app did not deploy correctly, but it doesn't know why (which is remarkably unhelpful, but in fairness it's deploying a lot of potentially complex resources).
This can make diagnosing and fixing the issue really problematic, so I've compiled this kind of mental checklist that I go through to fix it. Each of the techniques will work some of the time, but I don't think there are any that will work all of the time. This list is not intended to help you diagnose what causes this issue, it's literally just designed to get you back up and running.
The fast options (will solve most problems)
- Check the AWS Console! Sometimes the amplify-cli will show an error, but it actually did what you asked it to in the cloud. Before proceeding, always check to make sure the error was actually fatal.
- Try running
amplify push --iterative-rollback
. It's supposed to roll your environment back to the last successful deployment, but tbh it rarely works.
- Try running
amplify push --force
. Although counter-intuitive, this is actually a rollback method. It basically does what you think --iterative-rollback
will do, but works more frequently.
- In the AWS console, go to the deployment bucket for your environment (the bucket will be named
amplify-${project_name}-${environment_name}-${some_random_numbers}-deployment
). If there is a file called deployment-state.json
, delete it and try amplify push
again from the CLI.
- If you are working in a team of more than one developer, or have your environment in several different repos locally, or across multiple different machines, your
amplify/team-provider-info.json
file might be out of sync. Usually this is caused by the environment variable(s) in an Amplify Lambda function being set in one of the files but not in another. The resolution will depend on how out of sync these files are, but you can normally just copy the contents of the last working team-provider-info.json
file across to the other repo (from where the deployment is failing) and run the deployment again. However, if you've got multiple devs/machines/repos, you might be better off diffing the files and checking where the differences are.
- If this still doesn't work, you can restore your env config from the one in the last working deployment in the cloud by running
amplify env pull --restore
.
The slow option (production-friendly)
Hopefully you haven't got this far, but at this point I'd recommend you open a ticket in the amplify-cli GitHub with as much info as you can. They tend to respond in 1-2 working days.
If you're pre-production, or you're having issues with a non-production environment, you could also try cloning the backend environment in the Amplify console, and seeing if you can get the stack working from there. If so, then you can push the fixed deployment back to the previous env (if you want to) using amplify env checkout ${your_old_env_name}
and then amplify push
.
The complex option (solves more intricate problems with your stack)
If none of the above work (or you don't have time to wait for a response on a GitHub issue), head over to CloudFormation in the AWS console and search for the part of your stack that is erroring. There's a few different ways to do this:
- Check the CLI output for your last push and find the item whose status is something other than
UPDATE_COMPLETE
. You can copy the name of the stack and search for it in CloudFormation.
- Search CloudFormation for your environment name, click on any of the resulting stacks, click the link under
Parent stack
, repeat until you find a stack with no parent. You are now in the root stack of your deployment, there are two ways to find your erroring stack from here:
- Click on the
Resources
tab and find one with something red in the status column. Select the stack from this row.
- Click on the
Events
tab and find one with something red in the status column. Select the stack from this row.
- Once you've found the broken stack, click the
Stack actions
button and select Detect drift
from the dropdown menu.
- Click the
Stack actions
button again and select View drift results
from the dropdown menu.
- In the
Resource drift results
page, you'll see a list of resources in the stack. If any of them show DRIFTED
in the Drift status
column, select the radio button to the left of that item and then click the View drift details
button. The drift details will be displayed side by side, git-style, on the next page. You can also click the checkbox(es) in the list above to highlight the drift change(s). Keep the current page open, you'll need it later.
- Fixing the drift will depend on what it is - it's usually something in an IAM policy that's changed, you can fix this directly in the console. Sometimes it's a missing environment variable on a Lambda function, which you're better off fixing in the CLI (in which case you would need to run
amplify push
again and wait for the build to complete in order for the fix to be deployed to your environment).
- Once you've fixed the drift, you can click the orange
Detect stack drift
button at the top of the page and it will update. Hopefully you've solved the problem.
GraphQL bonus round (completely bananas DDB drift)
Another fun thing that Amplify does from time-to-time is to (seemingly spontaneously) change the server-side encryption setting on the definition of some or all of your DynamoDB tables without you even touching it. This is by far and away the most bizarre Amplify error I've encountered (and that's saying something)!
I have a sort-of fix for this, which is to open amplify/backend/api/${your_api_name}/parameters.json
and change the DynamoDBEnableServerSideEncryption
setting from false
to true
, save it, then run amplify push
. This will fail. But it's fine, because then you just reverse the change (set it back to false
), save it, push again and voila! I still cannot for the life of me understand how or why this happens.
I said it's a sort-of fix, and that's because you'll still see drift for the stacks that deploy the affected tables in CloudFormation. This goes away after a while. Again, I have no idea how or why.
The nuclear option (DO NOT USE IN PRODUCTION)
Obviously this one comes with a huge disclaimer: don't do this in production. If working with any kind of DB, you will lose the data.
You can make backups of everything and then start to remove the problematic resources one at a time, with an amplify push
in between each one, until the stack build successfully. Once it's built, you can start adding your resources back in.
Hopefully this helps someone, please feel free to suggest edits or other solutions.