0

I have two ec2 instances which run different applications on them, which is served through CloudFront. Tonight both became unresponsive at the same time. Any attempts to contact the apps through CloudFront ends up in 504.

I attempted:

  • Rebooting the instances;
  • Stopping them completely, restarting;
  • Connecting through ssh on the console - connection times out.
  • Connecting through ssh using the AWS console - it stays stuck in "Establishing connection..."
  • Redeploying the apps (through CodeDeploy) - deployments successful, but web app is not available still.

I see that both have very little CPU being used. One process I run on one of the instances is running because I see new log entries in CloudWatch. Also see that both web apps are started successfully.

I dont know what else to do to troubleshoot this. How can I tell if its me that did something or if it is Amazon having issues?

Edy Bourne
  • 103
  • 3
  • If AWS were having issues you would know, half the interest would be down. Are your instances in a public or private subnet? If they're in a public subnet have you tried a direct ssh to them? If not can you ssh to a bastion in a public subnet, then into them? Tried session manager - is that what you meant by "connecting though ssh on the console"? Have you look at the EC2 screenshot, and the system logs you can see in the EC2 console? – Tim Jul 09 '23 at 04:20

1 Answers1

1

Both ssh and http services being unresponsive would indicate either both are busted, or your IP networking is broken. Examine these instances in ways that does not require IP.

Start your own health monitoring that checks if the ssh and http ports are reachable. From hosts both locally in the same subnet, and external from the internet. Fancy isn't required, as long as you can tell the minutes that 22/tcp is reachable or not.

Try a text console to examine networking configuration. AWS EC2 has a serial console for some instance types.

No shell at all makes things difficult. Restore a backup of the problem instances somewhere else, and examine log files that way.

Rebuild from scratch and see if the problem persists. Stand up a test environment isolated from production, but using the same infrastructure template and application deploys. Possibly in a different region, although that introduces variables.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • Thanks, by restoring a backup I found out the instance was ok but a shared code relied upon by both apps was stuck on a bogus http call without a timeout, and that caused the instance to stop responding for some reason. Its all good now, Thanks! – Edy Bourne Jul 09 '23 at 19:49