2

I have a TCP server built with sockets in Python. The application I'm building is time-sensitive, so the integrity of the data is important, therefore we need TCP. The bandwidth is very low.

And there's a client which requests data from the server every 50 ms. The client gets as response an OK message in case the server doesn't have the data or the actual required data.

Whenever the client makes a request to the server, it sends a frame of 5 bytes (not including the 40 extra bytes that come from IP and TCP). On the other side, the server either responds with a frame of 5 bytes (in most cases) or a frame of > 70 bytes (generally every second)

On both sides the sockets are set like this:

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # this line is excluded in client's case
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 8192)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.settimeout(0.5)

Everything runs fine on the local network (no lag at all), but whenever I connect to the server from the public IP (I'm port-forwarding) it lags a lot. The lag can go up to 15 seconds (at that moment it times out), which is incredibly much. Most of the time the RTT stays at 200-210 ms. On WireShark I can see that there are lots of (spurious) retransmissions and dup ACK.

What can I do? I've already disabled the Nagle's algorithm, but with no success yet.

  • Hi Roberteagle, can you upload the Wireshark capture somewhere? Alternatively I can provide you an email address to email it to. – Mark Riddell Aug 21 '16 at 06:50
  • @MarkoPolo [screeshot](http://i.imgur.com/csElvDs.png). The server is binded to 192.168.0.15. Both the clients are on a vm machine with the address 192.168.0.26. The clients connect to the IP 89.137.123.51 which is forwarded to the server we are discussing. – Robert Lucian Chiriac Aug 21 '16 at 07:13
  • This is something I got right now. It caught my attention because the RTT jumped a lot with no apparent reason. The data on this client is received every second. [screenshot](http://i.imgur.com/RdgDvTM.png) – Robert Lucian Chiriac Aug 21 '16 at 07:22
  • I really need to see the actual Wireshark to make sense of what is going on. Please confirm my understanding - 192.168.0.15 is the local addres of the server, 89.137.123.51 is the public address of the server which clients connect to. 192.168.0.26 is the local address of a client. Your client to connecting to 89.137.123.51 from inside the same network that the server - i.e. your firewall is performing hairpin NAT. If so, can you replicate this when using a client outside of your own network? Please also confirm whether you took the capture on client or server (I'm assuming client). – Mark Riddell Aug 21 '16 at 07:43
  • Yes, you understood well. I can try to replicate the situation from out of my network. I'll do it now. I'll be making a wifi hotspot with my phone if that's the case. Regarding the WireShark, I have to admit I don't know how to use it properly, so I don't understand exactly what you're asking me to do. For the lack of a better option, I was "forced" to "use" WireShark, because I couldn't find a viable solution before. It's a totally new tool for me. What are your suggestions? – Robert Lucian Chiriac Aug 21 '16 at 07:52
  • If you're available at the moment and it's necessary, I could show you through skype or teamviewer – Robert Lucian Chiriac Aug 21 '16 at 07:56
  • Thanks Robert. For Wireshark, if you can start a fresh capture, demonstrate the issue, stop the capture and save it. Then if you can upload it somewhere, I can download the file and examine it locally in Wireshark. Alternatively, I can give you an email address to mail the file to. – Mark Riddell Aug 21 '16 at 07:57
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/44265/discussion-between-markopolo-and-roberteagle). – Mark Riddell Aug 21 '16 at 08:02

1 Answers1

2

I've had a good look over the capture files provided and here is my analysis. In summary, I believe this is an issue with your Router, which appears to be a Technicolor device of some sort.

Client Side Capture

  • Your client is having major issues trying to connect to a variety of websites. HTTPS websites (www.bing.com, wdcp.microsoft.com etc) are getting no response after the Client Hello stage resulting in retransmissions and eventual timeout from your device. Another set of HTTP requests to an Akamai hosted website (104.90.152.18) is resulting in a 408 Request Time-out.
  • Looking specifically at the traffic from the client to the server the vast majority of the sessions start reasonably OK but then encounter packet loss resulting in retransmissions from the client and timeouts. For example, examine packet number 161 - 207. At packet 161 the client sends a data packet to the server but gets no response back, causing the client to retransmit for around 15 seconds before the connection is torn down.

    The majority of the TCP streams demonstrate this behaviour so it we can conclude that either the data packets from the client are not reaching the server OR the response from the server is not reaching the client.

  • Looking at the latency, there is a significant (and volatile) delay between the SYN and SYN/ACK response from the server, ranging from 168ms to 770ms.

Server Side Capture

  • Unfortunately, the server side capture does not capture the same events as the client side capture. I am also unsure where exactly in the network this has been captured as it includes client and server traffic. ICMP redirects are also being sent which indicates sub-optimal routing. I do not believe this to be causing the issue however.
  • If you apply a wireshark display filter for tcp.stream eq 1 || tcp.stream eq 2 you can see both sides of the communication. Specifically, Client > Firewall and then Firewall > Server (and vice-versa). Again, everything starts OK and then around packet 407 things get interesting.

    Packet #407 marks the point when the client sends a chunk of new data to the server. The router receives this and forwards it to the server. The server sends an Acknowledgement packet back (packet #410) as well as another small data packet (#411). What we don't see however is the router passing these packets back to the client - this is the best evidence I have found of this being a router issue.

Compare this to one of the many successful exchanges slightly further up in the trace - packet 394 to 406 for example:

  1. (#394) Client sends a data packet to the public IP of the server
  2. (#396) Router receives this and forwards it to the local IP of the server
  3. (#397) Server sends an acknowledgement back to the NAT'd IP of the client
  4. (#398) Server sends a small data packet back to the NAT'd IP of the client
  5. (#401) Router sends the acknowledgement back to the client's local IP
  6. (#402) Router sends the small data packet back to the client's local IP
  7. (#403) Client sends an acknowledgement back to the public IP of the server to confirm it received the data the server sent
  8. (#406) The router forwards the acknowledgement to the local IP of the server.

When things fail, everything stops after stage 4 - the two packets sent from the server appear to be dropped at the router.

Final Thoughts

  • Most of your TCP connections, not just your Python application, seem to be being suffering from performance issues as demonstrated by the many connection issues in your client side capture.
  • There is reasonably proof in your server side capture that packets are being blackholed when they have to be forwarded through your router.
  • Your testing has concluded that there is no issues when testing this application locally, when traffic does not need to traverse the router for port forwarding.
  • Unfortunately, I am not familiar with Technicolor routers at all and the only thing I could suggest would be to check whether there are any Firewall or Quality of Service rules enabled on the router which could be impacting performance. Perhaps if you can test with an alternative router or host your application in another network to see if the issues persist.
Mark Riddell
  • 1,143
  • 1
  • 7
  • 11
  • My router is indeed a Technicolor, but it wasn't my choice when it was put in. This router is what the ISP gave me to use and I cannot use another router because that's their policy. I think I have 2 options here: check for QoS settings on this router or add another router to this already existent router. I'm not sure if the second option would help me at all, but that's what I got. – Robert Lucian Chiriac Aug 21 '16 at 12:30
  • Perhaps you could get in touch with the ISP and see if they can detect any issues or perhaps offer a replacement? – Mark Riddell Aug 21 '16 at 12:39
  • I've sent you messages in the chat. That might be an option. Isn't there a way of masking these things, maybe w/ another router linked to this router , or something else? – Robert Lucian Chiriac Aug 21 '16 at 12:44
  • I did put the router in bridge mode and to it I have connected another router. Everything runs fine now, the RTT stays at a consistent 65-70 ms (I changed the delay between send/receive calls from 50 ms down to 30 ms). And what's more surprisingly, it works fine even w/o setting the QoS. So the problem was at the Technicolor router and not to the application. Thank you a lot MarkoPolo :) – Robert Lucian Chiriac Aug 21 '16 at 15:28
  • That's great news! Glad you are up and running fully now. Cheers. – Mark Riddell Aug 21 '16 at 15:33
  • Who is your ISP? Comcast used technicolor while I worked for them, and they were garbage. I recommend the Motorola surfboard. Besides the router, it could be a problem with noise due to termination or calling issues after demarcation. – Jonathon Anderson Aug 21 '16 at 21:33
  • @NonSecwitter My ISP is [UPC](http://www.upc.ro). I have decent speeds of half of gig and low latency, but I can attest their router is garbage. They've replaced the router for 2 times already in 2 years, since the others broke down. What happened to the old days when they gave you only the cable and you weren't forced to accept their router? – Robert Lucian Chiriac Aug 23 '16 at 04:23