-1

In the client side , sftp application send some packet to ssh server port 22.

SFTP application send packet to TCP , from etherial capture we can see that sftp packet send from application to TCP and TCP send to packet to server but TCP not recieved TCP ACK from the server so TCP again send the packet after few second but still no response from server..

It seems that server no received the packet from client.

Client SFTP allication wait in select for TCP recv with timeout of 120 second after 120 second application get timeout from select and close the SFTP operation with timeout error.

In capture I can see TCP retransmit packet many times but fail to recv server TCP ACK.

Scenario: 1. Timeout happen sometime only. 2. After this issue next SFTP opration[upload] success with same server. 3. It seems network has no issue because next upload is working fine. 4. both client and server has SOLARIS OS 5. we are unable to reproduce this in our Lab environment 6. This issue happen only in real customer network. 7. Appln is in C language. SSH server is Open SSH server.

I want to know: 1. How can we found reason for TCP not recv ACK repply form Server. 2. Is any TCP system setting in solaris cause this issue. 3. please provide any inforamtion so that we can resolve this issue.

Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
Syedsma
  • 1,183
  • 5
  • 17
  • 22

1 Answers1

1

I assume your topology looks like this:

           10.25.190.12               10.10.10.10
           [e1000g0]                  [eth0]
SFTP_Client--------------Network------------OpenSSH_Server

There are two things you need to do:

  1. Establish whether there is regular significant packet loss between your client and server. TCP tolerates some packet loss, but if you start dropping a lot (which is honestly hard to quantify) it's going to just give up in some circumstances. I would suggest two ways of detecting packet loss... the first is mtr, the second is ping. mtr is far preferable, since you get loss statistics per-hop (see below). Run mtr 10.10.10.10 from the client and mtr 10.25.190.12 from the server. Occasionally, packet loss is path-dependent, so it's useful to do it from both sides when you really want to nail down the source of it. If you see packet loss, work with your network administrators to fix it first; you're wasting your time otherwise. In the process of fixing the packet loss, it's possible you will fix this TCP ACK problem as well.

  2. If there is no significant packet loss, you need to sniff both sides of the connection simultaneously with snoop or tshark (you can get tshark from SunFreeware) until you see the problem again. When you find this situation with missing TCP ACKs, figure out whether: A) the OpenSSH_Server sent the ACK, and B) whether the SFTP_Client received it. If the Client gets the ACK on its ethernet interface, then you probably need to start looking in your software for clues. You should be restricting your sniffs to the IP addresses of the client and server. In my experience, this kind of issue is possible, but not a common problem; 90+% of the time, it's just network packet loss.

Sample output from mtr:

mpenning@mpenning-T61:~$ mtr -n 4.2.2.4
HOST: mpenning-T61              Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 10.239.84.1                0.0%    407    8.8   9.1   7.7  11.0   1.0
  2. 66.68.3.223                0.0%    407   11.5   9.2   7.1  11.5   1.3
  3. 66.68.0.8                  0.0%    407   19.9  16.7  11.2  21.4   3.5
  4. 72.179.205.58              0.0%    407   18.5  23.7  18.5  28.9   4.0
  5. 66.109.6.108               5.2%    407   16.6  17.3  15.5  20.7   1.5 <----
  6. 66.109.6.181               4.8%    407   18.2  19.1  16.8  23.6   2.3
  7. 4.59.32.21                 6.3%    407   20.5  26.1  19.5  68.2  14.9
  8. 4.69.145.195               6.4%    406   21.4  27.6  19.8  79.1  18.1
  9. 4.2.2.4                    6.8%    406   22.3  23.3  19.4  32.1   3.7
Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
  • Thanks Mike for your detailed inforamtion. 1. mtr commnad is not availabkle in the machine why ? 2. How to confirm we are restricting your sniffs to the IP addresses of the client and server – Syedsma Dec 08 '11 at 10:34
  • @Syedsma, unless you can find `mtr` on SunFreeware (and I can't), you'll have to compile it from the [`mtr` source](ftp://ftp.bitwizard.nl/mtr/). Using the example above, you should use `snoop -o cap_filename -d e1000g0 tcp and 10.25.190.12 and 10.10.10.10`... `-o cap_filename` saves the capture to a file instead of sending it to the terminal... that part is optional. – Mike Pennington Dec 08 '11 at 11:17
  • (1). mtr I can't install in network machine - I don't have permission (2) I got the capture in file. Is thr any otehr way to find root cause for tcp timeout because of no reply from server? – Syedsma Dec 08 '11 at 11:30
  • @Syedsma, assuming it is packet loss (most of the time it is)... you could manually run `traceroute 10.10.10.10` on the client, `traceroute 10.25.190.12` on the server, then simultaneously `ping` every hop in the path (each one in a different terminal window). Look for where in the network the packet loss begins... honestly most of the time, you should be responsible just for pings between the client and server... let your network administrators do the rest, if you find significant `ping` loss – Mike Pennington Dec 08 '11 at 11:43