23

I've replicated this two or three times, so I'm guessing there's something wrong with what I'm doing.

Here are my steps:

  1. Launch new instance via EC2 Management console using: Ubuntu Server 13.10 - ami-ace67f9c (64-bit)
  2. Launch with defaults (using my existing key pair)
  3. The instance starts. I can SSH to it using Putty or the Mac terminal. Success!
  4. I reboot the instance
  5. 10 minutes later, when the instance should be back up and running, my terminal connection shows:

    stead:~ stead$ ssh -v -i Dropbox/SteadCloud3.pem ubuntu@54.201.200.208
    OpenSSH_5.6p1, Op`enSSL 0.9.8y 5 Feb 2013
    debug1: Reading configuration data /etc/ssh_config
    debug1: Applying options for *
    debug1: Connecting to 54.201.200.208 [54.201.200.208] port 22.
    debug1: connect to address 54.201.200.208 port 22: Connection refused
    ssh: connect to host 54.201.200.208 port 22: Connection refused
    stead:~ stead$
    

Fine, I understand that the public IP address can change, so checking the EC2 management console, I verify that it is the same. Weird. Just for fun, I try connecting with public DNS hostname: ec2-54-201-200-208.us-west-2.compute.amazonaws.com. No dice, same result.

Even using the Connect via Java SSH client built into the EC2 console, I get Connection Refused.

I checked the security groups. This instance is in group launch-wizard-4. Looking at the inbound configuration for this group, Port 22 is allowed in from 0.0.0.0/0, so that should be anywhere. I know that I'm hitting my instance and this is the right security group, because I can't ping the instance. If I enable ICMP for this security group, all of a sudden my pings go through.

I've found a few other posts around the internet with similar error messages, but most seem to be easily resolved by tweaking the firewall settings. I've tried a few of these, with no luck.

I'm guessing there's a simple EC2 step I'm missing. Thanks for any help you can give, and I'm happy to provide more information or test further!

Update - Here are my system logs from the Amazon EC2 console: http://pastebin.com/4M5pwGRt

SteadH
  • 666
  • 3
  • 16
  • 33
  • 3
    I would suggest that look in to system logs on AWS console to see if it says something didn't went well while rebooting , you might want to be sure that both accessibility checks pass when system reboots and when u are trying to ssh (on console only) – APZ Jan 18 '14 at 19:31
  • 3
    You did nothing after the first connection ? No messing with IP tables or sshd config files ? Because it looks like you are dropping the connection not that port 22 is not available. – typositoire Jan 18 '14 at 19:39
  • Did you mess with `/etc/fstab` before rebooting? – David Levesque Jan 18 '14 at 23:12
  • No changing of iptables or fstab before rebooting. First command I ran was "reboot now" I'll update above with my AWS System Logs – SteadH Jan 19 '14 at 23:01
  • Also, status checks are both good - 2/2! I was hoping I had something simple wrong with my set up... maybe not! – SteadH Jan 19 '14 at 23:07
  • I don't think that you are missing out on any steps in provisioning a server. You might want to change the AMI and see how it goes, although I don't have a very concrete reason for changing AMI but since we have no real idea of what is happening this might worth the shot. – APZ Jan 20 '14 at 07:09
  • Changing the AMI did it! That version of Ubuntu Server was just funky. – SteadH Mar 12 '14 at 01:31
  • How are you rebooting via the ssh command reboot or via AWS web login user interface? I was facing same issue after rebooting via SSH putty, using AWS web login to stop and start the instance later solved my issue. – Akki Mar 24 '20 at 10:19
  • There is another thing which I experienced this morning, and totally overlooked it through my Googling. Usually, it's a good practice to declare which source you're setting your SSH for. However, you have to remember that your IP is also dynamic when WFH. Your Security Group should therefore have Source as 0.0.0.0/0 - but that's okay because you have the right PK to authenticate yourself. I also started detaching my EBS volume, but then paid attention to the above to fix the real problem. – ha9u63a7 Nov 14 '20 at 14:29

9 Answers9

20

From the AWS Developer Forum post on this topic:

Try stopping the broken instance, detaching the EBS volume, and attaching it as a secondary volume to another instance. Once you've mounted the broken volume somewhere on the other instance, check the /etc/sshd_config file (near the bottom). I had a few RHEL instances where Yum scrogged the sshd_config inserting duplicate lines at the bottom that caused sshd to fail on startup because of syntax errors.

Once you've fixed it, just unmount the volume, detach, reattach to your other instance and fire it back up again.

Let's break this down, with links to the AWS documentation:

  1. Stop the broken instance and detach the EBS (root) volume by going into the EC2 Management Console, clicking on "Elastic Block Store" > "Volumes", the right-clicking on the volume associated with the instance you stopped.
  2. Start a new instance in the same region and of the same OS as the broken instance then attach the original EBS root volume as a secondary volume to your new instance. The commands in step 4 below assume you mount the volume to a folder called "data".
  3. Once you've mounted the broken volume somewhere on the other instance,
  4. check the "/etc/sshd_config" file for the duplicate entries by issuing these commands:
    • cd /etc/ssh
    • sudo nano sshd_config
    • ctrl-v a bunch of times to get to the bottom of the file
    • ctrl-k all the lines at the bottom mentioning "PermitRootLogin without-password" and "UseDNS no"
    • ctrl-x and Y to save and exit the edited file
  5. @Telegard points out (in his comment) that we've only fixed the symptom. We can fix the cause by commenting out the 3 related lines in the "/etc/rc.local" file. So:
    • cd /etc
    • sudo nano rc.local
    • look for the "PermitRootLogin..." lines and delete them
    • ctrl-x and Y to save and exit the edited file
  6. Once you've fixed it, just unmount the volume,
  7. detach by going into the EC2 Management Console, clicking on "Elastic Block Store" > "Volumes", the right-clicking on the volume associated with the instance you stopped,
  8. reattach to your other instance and
  9. fire it back up again.
Jeromy French
  • 319
  • 4
  • 15
  • This question might also be relevant: http://serverfault.com/q/325140/153062 – Jeromy French May 28 '14 at 15:57
  • Same issue and similar proposed fix at http://stackoverflow.com/a/21563478/1430996 The comment is particularly helpful. – Jeromy French May 28 '14 at 20:19
  • Thanks for this! I suspect this would have fixed the issue, and that's a good way to get at that SSH log. Thanks! – SteadH Jun 02 '14 at 23:57
  • This worked, thanks. Although my problem (same symptom: "connection refused") was due to wrong ownership of the dir /var/empty/sshd. It should have been root:root. Why did it change: no idea, we were never even close it. Oh well. – cucu8 Mar 12 '18 at 14:33
  • @JeromyFrench I have the same issue. I followed the procedure but I didn't get the '"PermitRootLogin without-password"'. It has "PermitRootLogin=prohibit-password". What should i do? – Vaibhav Kumar Apr 14 '18 at 16:01
9

Had a similar behavior today on my ec2 instance, and tracked down the thing to this: when I do sudo reboot now the machine hangs and I have to restart it manually from the aws management console when I do sudo reboot it reboots just fine. Apparently "now" is not a valid option for reboot as pointed out here https://askubuntu.com/questions/397502/reboot-a-server-from-command-line

thoughts?

oromoiluig
  • 114
  • 1
  • 1
1

I had the same issue after running a vanilla sudo reboot command. I found I was able to resolve the issue by completely stopping (not rebooting) my AMI using the AWS console and then starting it back up.

For whatever reason, restarting the AMI from the AWS console, as in clicking the restart action as opposed to stopping and then starting the instance, did not fix the problem.

ACV
  • 127
  • 2
  • This worked for me, after reboot from the command `sudo reboot`, the instance stopped working and even Rebooting from AWS console did not work. So I stopped the instance and then started it again. Now it is connecting fine. Not sure why this has been downvoted. – Umair Malhi Oct 08 '21 at 07:56
0

It may not help the situation any, but I have seen some cases where a reboot on EC2 gets 'stuck'. If you do a 'reset' on the VM and then retrive the system logs, it may change the behavior. Be sure that the logs are from the second boot and not the first one - they tend to be delayed on updates.

One other thing to check is to be sure that the instance is responding on the IP. You appear to be getting a connection refused above, which sounds like instance is up, but SSH isn't running or is firewalled, but be sure that the instance has fully rebooted.

You could also try opening all ports from a test system, and see what 'nmap' shows you - are any other services responding on the instance.

Nathan Neulinger
  • 607
  • 1
  • 6
  • 17
0

Right click on the instance name and click on "Change Security Groups". Make sure that the Security group you created that allows anyone from anywhere to Port 22 is checked and applied to this instance.

shaimoom
  • 125
  • 1
-1

I had a similar problem, my EC2 Amazon Linux instance was not reachable anymore after running sudo reboot.

No SSH access, stop/start/reboot commands from Amazon admin console gave me no result too.

I was finally able to restart my instance by creating an image via the Amazon console. The image creation process seems to fix the instance state.

Hope it helps ;)

-2

I got this problem after doing sudo reboot now via SSH on my EC2 server running Ubuntu 14.04. Worked fine after rebooting again using the EC2 Management Console.

-2

In my case I'd set up a security group to allow port 22 connections from my IP only. Some days later my ISP has changed my IP address, hence the security group needs updating.

redcalx
  • 187
  • 2
  • 8
-3

As mentioned, you probably messed with the /etc/fstab/

I had this problem. First you have to re-add the volume at /dev/sda1 like the warning message says.

Then I couldn't ssh. I realized that I had to add the other volume I created and that fixed the ssh problem.

Then you can login and fix the fstab back to the original.