4

We have an automation tool which tries to log in through ssh and send commands, which works fine when server is running. On the other hand while server is booting up, our tool check if the ssh port (22) is open, and if it is open it tries to connect to server and send commands.

However, when the server is in bootup sequence and our automation tool checks if the port 22 open, it tries to connect to server using ssh client but server rejects or ssh client returns error "ssh port is not open".

We have tried to investigate this issue with telnet and saw that, while in the bootup sequence, sshd starts and opens the port 22 and start listening but it is somehow closes again the port and opens it up again in a while. And that is the exact same time our automation tool tries to login.

My question is; how can we make sure that ssh port is succesfully open and ready to take commands ?

Thank you for your time to answer, Best regards

Harun Baris Bulut
  • 455
  • 1
  • 8
  • 21

6 Answers6

3

First it seems that the automation tool is not verifying the exit status of ssh. I would try to fix the problem there.

One solution is to try to fill a bug for the team that created the tool.

Another solution would be to wrap the ssh command in a script that would do this transparently. E.g. create a script in /opt/myproject/ssh_wraper.sh

Here you can have something like:

SSH_EXIT_STATUS=255
while [[ $SSH_EXIT_STATUS -eq 255 ]];do
    ssh ....
    SSH_EXIT_STATUS=$?
done
Mircea Vutcovici
  • 17,619
  • 4
  • 56
  • 83
  • As automation tool is not some bash script, problem with the exit status may be true. We will check the tool for this and as you said we will try to put a bash script in between automation tool. Thank you for your answer. I will let you know the result. – Harun Baris Bulut Jan 30 '13 at 15:32
2

You could try experimenting with the exit status you get from something like ssh user@host "echo 0 > /dev/zero"

If the command completes successfully, you would get a 0 (indicating that the system was ready). A failed attempt would result in an exit code of 255.

You might want to consider using -o ConnectTimeout= and -o ConnectionAttempts=, too.

I'd agree with Steve, too, though. Maybe just wait a little longer. Depeding on how aggressively your tool tries to probe for the port, increase the delay before trying to attempt a login.

Michael
  • 161
  • 3
  • We will try to do ssh user@host ... command and let you know about the result. I guess this is more logical because of the problem I mentioned at Steve's comment. – Harun Baris Bulut Jan 29 '13 at 15:23
2

You could put a loop ahead of the login to wait until the port is open.

until nc -zvw 1 $host 22; do
  sleep 2
done
ssh $host $cmd

If you don't want the risk of an endless loop if the condition never reaches true, you could set an 'or' value some how. Exercise is left to the reader. :)

Aaron Copley
  • 12,525
  • 5
  • 47
  • 68
  • We are definitely exercising on this :) as Steve's solution is almost the same we have done this and this brings us to other weird problem like SSH does not response from some point, maybe it starts to block us, I will try this also :) – Harun Baris Bulut Jan 30 '13 at 15:28
1

Would the easist solution to be have the automation tool attempt to login, if it fails, to wait x mins and try again?

All the while the server is booting, odd things like this can happen.

Steve
  • 342
  • 2
  • 10
  • The easiest solution seems as you said to try to login after x mins or secs. However we are right now trying to do that and saw some other weird things like not connecting to server. – Harun Baris Bulut Jan 29 '13 at 15:22
0

Something you can try is to add a script to the boot sequence of your server (in /etc/rc.local for example) that will turn off the firewall on port 22. This script (as stated in the comments of the /etc/rc.local) will be executed after all the other init scripts. So as long as your sever hasn't finished it's boot sequence, port 22 is still unreachable, behind the firewall. It has the advantage of leaving the automation tool unmodified.

Based on a RHEL6 OS. Maybe the init scripts are different on your distribution.

Rosco
  • 455
  • 3
  • 6
0

This is what I do when I start a server in AWS and wiat for it to be available for SSH connectivity, I am using bash to do this:

status="notknown"    
until [[ $status == "running" ]]; do
status=$(EC2 tools command to get the status)

if [[ $status != "running" ]]; then sleep 3; fi
APZ
  • 954
  • 2
  • 12
  • 25
  • This is almost the same thing that we do however we are not using AWS. In our case the virtualization platform tells us that server is running. Actually it is running but it somehow closes the port in some point so I guess this is not a 100% solution. But we will try definitely. Thank you very much for this solution. – Harun Baris Bulut Jan 30 '13 at 15:26