4

I need to login to a system that went into a read-only state. I can ping it just fine but I can't ssh in anymore. Is there some special command line flag/parameter I can pass ssh that lets me login into a system that has gone into read-only mode?

Forgot to add the exact connection error:

OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.0.4 [192.168.0.4] port 22.
debug1: Connection established.
debug1: identity file /home/username/.ssh/identity type -1
debug1: identity file /home/username/.ssh/identity-cert type -1
debug1: identity file /home/username/.ssh/id_rsa type -1
debug1: identity file /home/username/.ssh/id_rsa-cert type -1
debug1: identity file /home/username/.ssh/id_dsa type -1
debug1: identity file /home/username/.ssh/id_dsa-cert type -1
ssh_exchange_identification: Connection closed by remote host

I should add that ping is pretty robust to that box:

ping 192.168.0.4
PING 192.168.0.4 (192.168.0.4) 56(84) bytes of data.
64 bytes from 192.168.0.4: icmp_seq=1 ttl=64 time=0.662 ms
64 bytes from 192.168.0.4: icmp_seq=2 ttl=64 time=0.088 ms
64 bytes from 192.168.0.4: icmp_seq=3 ttl=64 time=0.089 ms
htfree
  • 483
  • 4
  • 9
  • 21
  • 1
    Not usually, no; this is what remote console access is for. – MadHatter Apr 24 '15 at 06:21
  • 2
    I don't positively know that sshd will refuse connections when it has no writable file system. I wouldn't assume that the ssh connection closed error necessarily is due to the file system problem. How, exactly do you know the fs has been remounted read-only? Are you using remote syslog? – caskey Apr 24 '15 at 06:35
  • @MadHatter Do you know why it is so? I would assume the ssh server was designed to still allow logins even if the root file system is mounted read-only. But OTOH the kernel doesn't remount the file system read-only for no reason. The underlying error causing the file system to go read-only might also prevent the ssh server from accessing the files it need to allow a login. – kasperd Apr 24 '15 at 06:36
  • 4
    If the system is read-only, the ssh server can't write agent-related temporary files in `/tmp`, can't make new `wtmp` and `lastlog` entries, and so on. In brief, unless the `ssh` server has been configured to allow completely footprintless logins to a normally-working system, I think it would be highly-remiss of it to allow logins when those footprints can't be laid down - and even more of a security disaster to have a special flag that allowed the client to request such silent operation! – MadHatter Apr 24 '15 at 06:42
  • 1
    That said, htfree, what do you get when you do `telnet 192.168.0.4 22`? – MadHatter Apr 24 '15 at 06:44
  • @MadHatter thanks, yea, those lazy admins, has no serial console setup! oh, oops I was one of them but hey was my friend's server so expected him set it up. I may have to see if can get KVM console on it. – htfree Apr 24 '15 at 06:45
  • @MadHatter thanks i'll have to wait till my friend gets me telnet (seems not installed), I guess he's afraid i'll steal all his bitcoins if he gives me root (I have root on the 192.168.0.4 on that lan not on the client ) :D – htfree Apr 24 '15 at 06:47
  • @caskey I had logged in before and one drive of the mirror was down and there were some swap errors in the log, some issues with swap being mirrored and having basically same label and one of the drives being dead so I know the system was going down the hill – htfree Apr 24 '15 at 06:50
  • @kasperd yea I would love to know for sure too if ssh is supposed to work into a system that has taken itself into read-only mode due to issues, see my more verbose logging of ssh I updated. – htfree Apr 24 '15 at 06:51
  • @htfree this indicates the system is in a bad way but doesn't specifically show the connection dropped is related to the disk problem. You may (for example) have a completely full process table due to the massive backlog of io timeouts and errors preventing things from winding up cleanly. Related but not directly interfering problems. At this point if you have critical data on the host, powering it down and recovering the disks is the best option. If the data is not irreplaceable, a remote reboot is the next step. Of course you may wind up back in a similar bottleneck. – caskey Apr 24 '15 at 06:52
  • 1
    @MadHatter Anybody security-aware would not forward an agent unless it was strictly needed. If you don't forward an agent, then those agent related files won't be needed. Logs are not written to disk by the ssh server itself. It sends log entries to syslog which writes them to disk. That means in case syslog is unable to write them, ssh could still go on. Updating `wtmp` and `lastlog` might happen within the `sshd` process, but I don't think a failure to update those would prevent a login. – kasperd Apr 24 '15 at 06:53
  • @caskey Yea, I think you have a very good point about that. If that's the case I guess I have no option but try to order some KVM console access to it. I was really upset with the way SWAP was setup, with this experience I think its imperative to never use "labels" for managing mirrored swap, or maybe never use mirrored swap period. I think its the cause of my woes, due to the failed disk and the mirrored swap. Sadly I may just have to accept your answer as the most likely unfortunately. – htfree Apr 24 '15 at 06:57
  • @htfree If the disk has stopped responding, then the file system is going to experience errors, which will cause it to go read-only. Going read-only is a safety measure to guard against data-loss. I would not expect the file system being read-only in itself preventing ssh logins. But if the disk has indeed stopped responding, then sshd is also going to get I/O errors every time it tries to read something which is not in cache. And those I/O errors could definitely prevent a login. – kasperd Apr 24 '15 at 06:57
  • @kasperd last time I was able to login, one disk of the raid was dead and dmesg gave error but from what I recall only related to SWAP which was stupidly somehow still trying to access the dead drive, see my comment to caskey... – htfree Apr 24 '15 at 06:59
  • @htfree (having been in this exact situation before; ssh dying with connection closed remote) at this point it all comes down to whether your goal is to bring the hardware back online asap or bring the data on the disk back online intact. In my past this usually has meant a late night drive to the Colo after paging the oncall Colo engineer to let me in. – caskey Apr 24 '15 at 07:00
  • @htfree If there is a read error on swap or a memory-mapped file (which includes all executeables and libraries), then the kernel cannot give the process a proper error message, and it will have to kill the process. – kasperd Apr 24 '15 at 07:02
  • @ everyone what about that line I posted "debug1: Connection established." So did ssh really connect in someway and then disconnected? – htfree Apr 24 '15 at 07:03
  • @caskey I was afraid of that, yea mainly I took most of my most important data off already and I did have an older copy of everything. But I have some MySQL databases I need off of it. I dread having to go down there though but might not have much choice I guess. But yea I just want data off of it, I've already migrated to new server. – htfree Apr 24 '15 at 07:05
  • @kasperd so I guess ur saying its hopeless, like caskey said i'll have to get my butt down there and get my drive? – htfree Apr 24 '15 at 07:07
  • Curious why ping is so robust and can someone explain the last lines of the ssh errors what does the identify file "type -1" mean – htfree Apr 24 '15 at 07:08
  • @htfree its your call, but if you already have "most" of the data, I would kamikaze it and ask the Colo (or remote power strip) to power cycle the box. Then again, I'm just some guy on the internet (who feels your pain) but when you break it you alone get to keep both halves. :-) good luck. – caskey Apr 24 '15 at 07:09
  • @htfree The symptoms you have presented so far do not prove that it is a disk problem. There are other possible reasons for the disconnect including some network related. But given that you already saw the RAID being degraded does make the disk error explanation more likely. – kasperd Apr 24 '15 at 07:09
  • @htfree The reason ping is reliable in this case is because it is handled entirely within the kernel. Responding to a ping does not involve any user mode code, and it can be processed without any disk access whatsoever. There are plenty of error scenarios which will leave a system able to respond to ping but not be able to even present the banner when an ssh connection is attempted. You need to show us the output of telnet command suggested earlier. – kasperd Apr 24 '15 at 07:12
  • @caskey hey man thanks, yea you're right, I'm going to likely do that or just go pick up the drive since I don't plan to fix it, I've migrated already. I may wait till tomorrow for my tinfoil hat friend to give me access to his hidden MySQL client and telnet and then likely just grab the hard drive. – htfree Apr 24 '15 at 07:12
  • @htfree *fistbump* good luck. – caskey Apr 24 '15 at 07:15
  • @kasperd thanks for ping explanation , any idea of the stuff from ssh verbose output with ssh connected and the type -1 on the identify file before disconnection? I'll have my friend install telnet tomorrow so I can try, see my last response to caskey, thanks to all those who tried to help, kasperd, caskey, madhatter, wish you long-running stable boxes :D – htfree Apr 24 '15 at 07:16
  • @htfree `type -1` just means that particular file isn't present on the client. That is perfectly normal. Those files are not required, and even if you were using key based authentication, you'd likely only have a key in one of those formats. Those files are not used that early during the process anyway. I am however wondering about the end of your output, cause I see a couple more log messages from the ssh client on my machine before it stops waiting for the banner. Maybe that is just due to different versions. – kasperd Apr 24 '15 at 07:25
  • 1
    **Any chance of the telnet results requested?** – MadHatter Apr 24 '15 at 07:26
  • Madhatter, sorry my friend is too lazy, he was supposed to install them for me on weekend, ill call him tonight. I had asked for rsh (to try rsync without ssh), telnet, and access to MySQL-client. I'll update on here, I might just go to datacenter Wednesday and take out the drive,will see. – htfree Apr 28 '15 at 01:58
  • I suspect the answer to your question is "No" but there are too many comments to check for further information. It's best to add relevant details and clarify what you're asking by editing the question itself. – Anthony Geoghegan Apr 28 '15 at 10:35
  • Finally some closure on my poor server, my friend was so lazy he left things as is until his side of the server (2 servers in one 1U rack) also went completely offline, but he paid for his sins and had to drive down 6 or 8hours, we took the server out and replaced the dead drives, fsck'd and recreated our mirrors and both servers in the 1U are back up and working. I've decided from now on I'm going to use file based swap instead of partitions and labels, especially to never do a swap on a mirrored device, the "Labels" were identical, and kernel was still trying to write to swap on failed drive – htfree Aug 05 '15 at 02:37

2 Answers2

1

You could login by invoking a no-login shell session. For example, you can pass a command to ssh after successfully authenticating:

ssh user@host bash --noprofile --norc

Of course, this uses bash as the shell. Different shells will require appropriate parameters in order not to trigger a wtmp/utmp update, as well as not trying to do things that could fail and log you off early (i.e., before the shell finishes up its usual tasks when normally opening up a login session).

A note of warning: the shell will be rather (very!) limited, usually without prompts and other fanciness. But it is enough for you to getting into the machine and checking what is wrong.

Edited to add: depending on the circumstances of your host, it could be necessary to specify a full path, for example /bin/bash as the command to execute upon successfully authenticating.

rnsanchez
  • 19
  • 3
  • It doesn't matter what command the OP asks the server to run. The ssh server is dropping the TCP connection immediately after it is made. He's not even authenticating. – Kenster Apr 28 '15 at 15:19
  • Now that @Kenster raised the red flag, this could be a case where `ssh` and `sshd` are not agreeing on the authentication method. Especifically, auth is key-based, but there are no suitable keys to proceed. Other than this, I'm out of ideas. Perhaps `ssh -vvv` could give further insight (I'm assuming only `ssh -vv` was used, as the highest debug was `debug2`). – rnsanchez Apr 28 '15 at 15:56
  • `this could be a case where ssh and sshd are not agreeing on the authentication method` Like I said, the server is dropping the connection as soon as it is made. No part of the ssh protocol is actually running. The very first thing the server would normally do is to send its software version string to the client in cleartext, and the server isn't even doing that, according to the debug trace. – Kenster Apr 28 '15 at 16:02
  • the debug I had posted was already -vvv , also I tried ssh -vvv user@host bash --noprofile --norc and it gives the same exact result. – htfree Apr 28 '15 at 23:22
1

the short answer is that your SSH server is probably nonfunctional due to the system problems which you've described. I don't think there is anything different the client could do.

I'll break down your debug trace:

debug1: Connecting to 192.168.0.4 [192.168.0.4] port 22.
debug1: Connection established.

The client made a connection to that address and port. This implies that the sshd process on the server is still running.

debug1: identity file /home/username/.ssh/identity type -1
debug1: identity file /home/username/.ssh/identity-cert type -1
debug1: identity file /home/username/.ssh/id_rsa type -1
debug1: identity file /home/username/.ssh/id_rsa-cert type -1
debug1: identity file /home/username/.ssh/id_dsa type -1
debug1: identity file /home/username/.ssh/id_dsa-cert type -1

Your local ssh client looked for those key files and didn't find them. These are all default keyfile names that it would normally look for. This is only a problem if you expected one of those files to be present. They're also not really relevant here, because the client never got a chance to authenticate.

ssh_exchange_identification: Connection closed by remote host

The remote server closed the TCP connection. This specific message means the server did a "normal" close on the connection. If the server had crashed, you'd see a different message saying "connection reset by peer".

Normally, the first thing an SSH server will do is to send its software version string. If that had happened here, you'd see this in the debug trace:

...
debug1: identity file /home/foo/.ssh/id_ecdsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.9p1 Debian-5ubuntu1.1
debug1: match: OpenSSH_5.9p1 Debian-5ubuntu1.1 pat OpenSSH*

The server isn't even getting this far before closing the connection.

If the server were healthy, the usual explanation for what you're seeing would be that the server is rejecting the client due to TCP wrappers. But in your case, something about the system state is probably preventing sshd from working properly. For example, immediately after accepting a connection the server will call [fork()][2] to create a child process. The child process handles the connection while the parent continues listening for further connections. If the fork fails, then the server will close the connection without sending anything to the client.

Kenster
  • 2,152
  • 16
  • 16
  • thanks for the nice breakdown/explanation, I'm still waiting for my friend to install rsh so I can try rsync -e rsh and also MySQL client connection. Likley seems i'll have to go to the datacenter and just pick up the drives. – htfree Apr 28 '15 at 23:25