We do most of our work on colocated servers in a datacenter over SSH. This means that we're connected to the boxes almost all day, 5 days a week. Intermittently, we'll see a lag between typing on the keyboard, and having the contents echo'd back to us on the shell. I started doing some digging, and I'm having trouble understanding the results; I'm also looking for next steps to look at. Earlier, I ran a wireshark trace against tcp.dstport == 22
, which seems to be where we have the majority of the problems. I did notice a large-ish (10-20 out of several thousand packets) that were TCP Retransmissions. I assume this is related to the lag issue we're seeing.
1) mtr to remote host
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 192.168.100.254 76.6% 454 0.5 0.5 0.3 4.7 0.4
2. 10.113.128.1 80.6% 454 17.3 130.8 5.7 6030. 726.7
3. 74.128.19.209 79.5% 454 9.7 25.8 6.7 1270. 133.2
4. 74.128.8.233 80.6% 454 8.5 31.9 6.6 1369. 150.6
5. 4.71.250.1 79.2% 454 1547. 50.5 14.7 1547. 194.1
6. 4.69.138.158 80.4% 454 20.1 29.7 15.4 1003. 104.5
7. 4.69.140.189 74.2% 454 16.2 28.6 15.0 920.0 85.5
8. 4.69.138.4 72.6% 454 17.0 41.2 15.5 821.6 81.7
9. ???
10. 216.26.190.9 79.4% 453 45.2 105.8 24.4 3008. 406.7
11. 216.26.162.162 90.7% 453 28.3 40.2 24.1 556.3 81.7
2) mtr to 192.168.100.254 (happening simultaneously to above mtr)
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 192.168.100.254 0.0% 591 0.8 0.4 0.3 6.9 0.5
First question: why does the top mtr suggest packet loss at 192.168.100.254, when the bottom one does not?
Second question: how can I determine better what might be causing this?
EDIT:
mtr to first host outside our network:
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. edge.networldalliance.local 18.1% 393 0.5 0.5 0.4 1.8 0.2
2. 10.113.128.1 0.0% 393 10.0 10.1 5.5 744.3 37.4
separate mtr to second host in the hop:
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. edge.networldalliance.local 87.9% 424 0.8 0.7 0.5 1.2 0.1
2. 10.113.128.1 0.0% 424 9.5 9.5 5.2 577.8 27.8
3. 74-128-19-209.dhcp.insightbb.com 0.0% 423 6.5 10.4 6.2 243.9 12.8
separate (again) mtr to third host in the hop:
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. edge.networldalliance.local 87.2% 440 0.6 0.7 0.4 2.2 0.3
2. 10.113.128.1 0.0% 439 6.4 10.9 5.6 991.8 47.2
3. 74-128-19-209.dhcp.insightbb.com 0.0% 439 8.5 13.3 6.5 744.3 35.6
4. 74.128.8.233 0.0% 439 7.9 23.6 6.3 493.8 47.2
Any suggestions based on this new data? I'm going to see about getting the router / firewall replaced.