EC2 instance unreachable (both HTTP and SSH) after unknown amount of time

Question

I have an Amazon Linux instance with only Wordpress running on it (installed using Amazon guide). After some amount of time (I don't know exactly how long, at least some hours), the instance is neither reachable through HTTP (tries to open website forever) nor SSH (message: ssh_exchange_identification: read: Connection reset by peer).

When I log into the console I see that the light is green and everything seems to be ok. Even rebooting the instance won't help the problem. The only solution is stopping the instance and starting again, then the whole cycle starts again: works for some hours and then suddenly doesn't.

Maybe it's important to mention that I have purchased a reserved Linux instance matching my EC2 instance. So I'm using that. Any ideas that could help me fix this problem?

Update: I made a snapshot of my instance and put it on a new volume, but that didn't solve the problem either.

Current security group (inbound):

HTTP and HTTPS from everywhere
SSH only from my IP

I checked access_log in httpd folder and it contains very few lines, of which a couple are weird Russian websites (sometimes). Which is weird because my website is not officially online and there is no link to it anywhere.

Couple of questions here. 1. Did you check auth_log of your server? 2. Did you check web server log of your server 3. Did you set up Security group for your instance? 4. Have you enable ssh only from specific IP's? Kindly update your question with these inputs — Shailesh Sutar, Sep 19 '17 at 16:35
Reserved instance doesn't make a difference to operation, that's a billing thing only. Rebooting restarts the operating system, stop / start moves you to new hardware. I doubt you've found multiple faulty pieces of hardware, and security groups / network ACLs are reliable. Make sure your security groups and NACLs are open to the world in case it's a dynamic IP from your client. Next guess is it's most likely something wrong with your operating system, but there's not enough information in your question to know for sure. — Tim, Sep 19 '17 at 20:13
@ShaileshSutar Sorry for answering so late, I thought the problem was solved, but apparently it's still there. Please check my update. I'm checking all my logs, as soon as I have something useful I will update. — Nima, Sep 28 '17 at 20:52
@Tim You mean that I should open SSH to the world? As I have HTTP already open. — Nima, Sep 28 '17 at 20:52
@Tim I have an Amazon Linux instance, you think there may be a problem with that? How can I check it? The only way to install Wordpress using Amazon guide was to have an Amazon instance, otherwise I would go with Ubuntu. — Nima, Sep 28 '17 at 21:10

score 2 · Accepted Answer · answered Sep 28 '17 at 22:34

In my opinion your server or the disk has become corrupt and needs to be fully replaced. Here's what I'd do

First up, you have a choice on approach

Set up a new Ubuntu 16.04 Linux server on EC2. There's an AMI so it's easy. I find Amazon Linux has less support and few packages available than Ubuntu. Use EasyEngine or similar to make this simpler.
Use a premade Wordpress AMI, such as Bitnami. I think they're still on Ubuntu 14.04, but you can easily do an upgrade to 16.04.

Another choice is using RDS or MySQL on the instance. I use MySQL on the instance because it's cheaper and works just fine.

Next

Set up your instance to run Wordpress, if you didn't use a premade AMI. Check the basic install works before moving on.
Do a MySQL dump from the original server, then import the data into your database.
Copy your data onto the new server. This will be the wp-content directory.
There will be some fiddling and tweaking to get this working properly.

If this ends up being unreliable it maybe be something weird in your database. Export your posts and content via XML, then import it. I would be surprised if this was necessary.

Misc

To answer your question above, only open SSH to your own IP.

score 1 · Answer 2 · edited Dec 12 '17 at 17:28

I faced the same problem but not as frequently as you mentioned.

In my case the server was non-responsive due to low memory, and this is not monitored by Amazon AWS console,

Some cron job was unable to do fork.

It's good to check your cron logs after stop-start , you may see this line

crond[2656]: (CRON) can't fork (do_command): Cannot allocate memory

To further avoid such issue, I was recommended by AWS Support to have a larger instance or setup swap file as well as setup third party alarms

score 0 · Answer 3 · answered Sep 27 '17 at 15:12

0

In the console, under settings, you should find a little "Console Snapshot" option that will take a "picture" of the raw console output.

I'll bet you that the instance crashed, and now it's stuck in some recovery state.

This happens semi-frequently. If you consider that AWS has 7x nines of EC2 resiliency, you should expect an instance (some instance, somewhere) to fault out every 20 minutes.

answered Sep 27 '17 at 15:12

Daryl Metzler

151
3

I cannot find "Settings" and "Console Snapshot". Could you explain where exactly? And about the crashing, what would you suggest to do? You mean that this is normal and that I should have a mechanism that stops and starts the server every time it crashes? – Nima Sep 28 '17 at 20:58
@Daryl Metzler EC2 SLA says "99.95% uptime", and they don't promise anything around "seven nines". You're possibly thinking of S3. It's NOT normal for an EC2 instance to crash. I have Wordpress on t2.micro and t2.nano instances, they basically run forever if I ignore them. In practice they tend to run for a few months before I'll do a reboot to update the kernel. – Tim Sep 28 '17 at 22:26

EC2 instance unreachable (both HTTP and SSH) after unknown amount of time

3 Answers3