0

What started as an annoying issue a few weeks back, is now driving me mad!

At home, I have a Ubuntu 10.04.03 box which acts as a fileserver. I backup things on it via rsync from other boxes, outside the network. When I connect to this fileserver, from my laptop, I forward the ssh-agent:

root@fileserver:~# env | grep SSH_AUTH
SSH_AUTH_SOCK=/tmp/ssh-IumRLB2628/agent.2628

There is this 1 box, also running 10.04.03 to which I can't connect. All the others are working fine, my SSH keys are being forwarded no probs, but this one server just won't have it. This is what I mean:

root@fileserver:~# ssh the-problematic-server -v
OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009
debug1: Reading configuration data /root/.ssh/config
debug1: Applying options for myserver
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to the-problematic-server [n.n.n.n] port 22.
debug1: connect to address n.n.n.n port 22: Connection timed out
ssh: connect to host the-problematic-server port 22: Connection timed out

From the same fileserver, to a different box, using the same forwarded ssh-agent:

root@fileserver:~# ssh the-good-server -v
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to the-good-server [n.n.n.n] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/identity type -1
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3p1 Debian-3ubuntu7
debug1: match: OpenSSH_5.3p1 Debian-3ubuntu7 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3p1 Debian-3ubuntu7
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'the-good-server.net' is known and matches the RSA host key.
debug1: Found key in /root/.ssh/known_hosts:10
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv THE FORWARDED KEY
debug1: Offering public key: /Users/gerhard/.ssh/calista_rsa <<<<<< THE FORWARDED KEY
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ THE FORWARDED KEY
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Requesting authentication agent forwarding.
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
Linux the-good-server 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 21:52:38 UTC 2011 x86_64 GNU/Linux
Ubuntu 10.04.3 LTS

And now, for the cherry on top, from the server that I just connected to...

root@the-good-server:~# ssh the-problematic-server -v
OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to the-problematic-server [n.n.n.n] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/identity type -1
debug1: identity file /root/.ssh/id_rsa type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
debug1: identity file /root/.ssh/id_dsa type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3p1 Debian-3ubuntu7
debug1: match: OpenSSH_5.3p1 Debian-3ubuntu7 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3p1 Debian-3ubuntu7
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'the-problematic-server' is known and matches the RSA host key.
debug1: Found key in /root/.ssh/known_hosts:1
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384
debug1: channel 1: new [authentication agent connection]
debug1: confirm auth-agent@openssh.com
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv THE FORWARDED KEY AGAIN
debug1: Offering public key: /Users/gerhard/.ssh/calista_rsa <<<<<< THE FORWARDED KEY AGAIN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ THE FORWARDED KEY AGAIN
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: channel 1: FORCE input drain
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: channel 1: free: authentication agent connection, nchannels 2
debug1: Requesting authentication agent forwarding.
debug1: Sending environment.
debug1: Sending env LANG = en_US
Linux the-problematic-server 2.6.34.6-64 #3 SMP Fri Sep 17 16:06:38 UTC 2010 x86_64 GNU/Linux
Ubuntu 10.04.3 LTS

I've also tried a different user btw, same thing happens when I try to connect from the fileserver. Nothing gets logged into the auth.log of that "the-problematic-server" box either, so it seems that it doesn't even get to the sshd part.

I am really running out of ideas here, I'm looking for wiser & sharper chops. Cheers!

UPDATE 27.09.2011

root@problematic-server:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:25:90:13:b3:a0
          inet addr:188.165.229.62  Bcast:188.165.229.255  Mask:255.255.255.0
          inet6 addr: fe80::225:90ff:fe13:b3a0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4021584924 errors:169 dropped:4562 overruns:0 frame:169
          TX packets:6302335682 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2467184127845 (2.4 TB)  TX bytes:8418184173437 (8.4 TB)
          Memory:febe0000-fec00000

eth0:0    Link encap:Ethernet  HWaddr 00:25:90:13:b3:a0
          inet addr:94.23.121.1  Bcast:94.23.121.1  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Memory:febe0000-fec00000

eth0:1    Link encap:Ethernet  HWaddr 00:25:90:13:b3:a0
          inet addr:94.23.152.36  Bcast:94.23.152.36  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Memory:febe0000-fec00000

eth0:2    Link encap:Ethernet  HWaddr 00:25:90:13:b3:a0
          inet addr:178.32.58.3  Bcast:178.32.58.3  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Memory:febe0000-fec00000

Few arping results:

root@problematic-server:~# arping -D -I eth0 -c 2 188.165.229.62
ARPING 188.165.229.62 from 0.0.0.0 eth0
Sent 2 probes (2 broadcast(s))
Received 0 response(s)
root@opteron16:~# arping -D -I eth0:0 -c 2 94.23.121.1
ARPING 94.23.121.1 from 0.0.0.0 eth0:0
Sent 2 probes (2 broadcast(s))
Received 0 response(s)

UPDATE 29.09.2011

ip route list

root@fileserver:~# ip route list
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.2 
default via 192.168.1.1 dev eth0  metric 100

root@problematic-server:~# ip route list
188.165.229.0/24 dev eth0  proto kernel  scope link  src 188.165.229.62 
default via 188.165.229.254 dev eth0  metric 100

dig

root@fileserver:~# dig problematic-server

; <<>> DiG 9.7.0-P1 <<>> problematic-server
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36025
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;problematic-server.    IN  A

;; ANSWER SECTION:
problematic-server. 1016    IN  A   188.165.229.62

;; Query time: 26 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Sep 29 09:32:50 2011
;; MSG SIZE  rcvd: 58

arping

root@fileserver:~# arping -c 5 188.165.229.62
ARPING 188.165.229.62

--- 188.165.229.62 statistics ---
5 packets transmitted, 0 packets received, 100% unanswered
gerhard
  • 73
  • 2
  • 9
  • Can you ping the box? Is the firewall running the box you're trying to connect to? Anything in /etc/hosts.*? – Andrew Case Sep 25 '11 at 02:10
  • 1
    I think you might get more help if you use something other than obscenities to name your servers, at least in your questions. – larsks Sep 25 '11 at 02:30
  • Frustration got the best of me @larsks, thanks for the edit. – gerhard Sep 25 '11 at 09:49
  • @ACase yes, I can ping the box fine. There's no firewall running. To make matters worse, this is an intermittent issue. It sometimes works - like now for example, when I was re-tracing all my steps... – gerhard Sep 26 '11 at 21:15

3 Answers3

1

It's probably a newtwork issue. Check if you can ping the box. Check the firewall (iptables) to see if it's blocking your host. Check the /etc/hosts.* file to see if it's denied there.

See if your host or the host you're connecting to may have an IP conflict. You can do an 'arping' on the host and see if you get back multiple hardware addresss.

Are you doing link aggregation or do you have multiple NICs in either of the hosts? It could be a routing issue.

[update] It looks like you have some problem along the route from filserver to problematic-server. For whatever reason it looks like data isn't able to get routed between those networks. Do you run the network routers that route this traffic? Sounds like a problem with your router.

Andrew Case
  • 3,489
  • 3
  • 23
  • 39
  • As a matter of fact yes, I have 4 IPs all pointing to a single NIC on the problematic server. Going to add another answer with some shell output. – gerhard Sep 27 '11 at 21:53
  • updated my question with arping results. Not sure if it helped. Everything seems to have been working fine for the best part of yesterday, this morning it stopped working again... – gerhard Sep 28 '11 at 09:18
  • Can you run: ip route list Also, what's your dns entry for this host look like? Are you connecting based on hostname or IP address? Try seeing if you have the same problem if you just connect based on IP value. – Andrew Case Sep 28 '11 at 16:27
  • Whatever IP you've been connecting to in the past (that has been having problems), run an arping against that IP address from the client machine that sometimes can't connect. You want to look for the MAC changing to verify that you don't have an IP conflict with another client. – Andrew Case Sep 28 '11 at 16:33
  • Same problem if I connect to the IP. Edited my question with ip route list & DNS info. When running arping, I get the same results back `5 packets transmitted, 0 packets received, 100% unanswered`. The 2 hosts are not in the same LAN - not sure if that makes a difference. I've added the exact that I've used for arping to the question. Should I use specific options? – gerhard Sep 29 '11 at 08:40
  • see updated answer – Andrew Case Sep 29 '11 at 20:23
  • I have control over the router in front of the fileserver. I think I know what the problem is. I have a cable modem which I couldn't run in bridge mode (firmware limitation). I just checked and they have pushed a firmware upgrade which adds this feature! – gerhard Sep 29 '11 at 22:14
  • So far, so good: `Local User (MAC=00-17-31-F2-63-7D): 192.168.1.2:58034 -> 188.165.229.62:22 (TCP)` (log from the router behind the cable modem). Now that the cable modem is dumb, I can hunt this bug further in the router. Everything feels nice and snappy, thanks for taking the time to see this through mate! – gerhard Sep 29 '11 at 22:46
0

This doesn't appear to be an SSH agent problem so much as a general network connectivity problem. As such, the usual diagnostic steps apply -- ping, tcpdump, check firewalls, etc etc.

womble
  • 96,255
  • 29
  • 175
  • 230
  • I can ping the box fine. I can also run a traceroute without any problems. What's more weird is that both the fileserver and the laptop are behind the same gateway. I never have any problems connecting from the laptop, the fileserver is a bit of a hit and miss. Now for example, it works without any problems. I did nothing except re-started the machine. Oh, and I have cleared the SSH keys on the fileserver, I was not using them anyways. Still, in the verbose output, I would have seen if those keys were the culprit. When it breaks again, I'll come back with more details. – gerhard Sep 26 '11 at 21:20
0

This is still not solved... Log from the LAN router with firewall enabled:

Local User (MAC=00-17-31-F2-63-7D): 192.168.1.2:57335 -> 188.165.229.62:22 (TCP)
[FW_Session][Pass][304/15000][@S:R=13:1, 192.168.1.2:57335->188.165.229.62:22]
[FILTER][Pass][lan->wan, 0:11:58.990][@S:R=13:1, 192.168.1.2:57335->188.165.229.62:22][TCP][HLen=20, TLen=60, Flag=S, Seq=1090671344, Ack=0, Win=5840]

And without the firewall:

Local User (MAC=00-17-31-F2-63-7D): 192.168.1.2:57336 -> 188.165.229.62:22 (TCP)

All that I get is:

 debug1: Connecting to 188.165.229.62 [188.165.229.62] port 22.
 debug1: connect to address 188.165.229.62 port 22: Connection timed out
 ssh: connect to host 188.165.229.62 port 22: Connection timed out

I can ping & traceroute the address fine. I can connect from a laptop within the same home LAN. There's no firewall running on either machines. I'm stumped @ACase.

gerhard
  • 73
  • 2
  • 9