Why is there a discrepancy between python sockets and tcp ping for the same IP:port destination?

Question

My setup:

I am using an IP and port provided by portmap.io to allow me to perform port forwarding.
I have OpenVPN installed (as required by portmap.io), and I run a ready-made config file when I want to operate my project.
My main effort involves sending messages between a client and a server using sockets in Python.
I have installed a software called tcping, which basically allows me to ping an IP:port over a tcp connection.

This figure basically sums it up:

Results I'm getting:

When I try to "ping" said IP, the average RTT ends up being around 30ms consistently.
I try to use the same IP to program sockets in Python, where I have a server script on my machine running, and a client script on any other machine but binding to this IP. I try sending a small message like "Hello" over the socket, and I am finding that the message is taking a significantly greater amount of time to travel across, and an inconsistent one for that matter. Sometimes it ends up taking 1 second, sometimes 400ms...

What is the reason for this discrepancy?

score 2 · Accepted Answer · answered Apr 02 '21 at 13:36

What is the reason for this discrepancy?

tcpping just measures the time needed to establish the TCP connection. The connection establishment is usually completely done in the OS kernel, so there is not even a switch to user space involved.

Even some small data exchange at the application is significantly more expensive. First, the initial TCP handshake must be done. Usually only once the TCP handshake is done the client starts sending the payload, which then needs to be delivered to the other side, put into the sockets read buffer, schedule the user space application to run, read the data from the buffer in the application and process, create and deliver the response to the peers OS kernel, let the kernel deliver the response to the local system and lots of stuff here too until the local app finally gets the response and ends the timing of how long this takes.

Given that the time for the last one is that much off from the pure RTT I would assume though that the server system has either low performance or high load or that the application is written badly.

Thanks for the insight... To isolate inefficiencies from the script itself, I've written a very primitive socket-programming client/server scripts just to measure the actual RTT, so I doubt the trouble is there. But can you clarify a bit what you mean by server having low performance or high load? Do you think there is anything I can do about this to improve RTT? Any general checklist of things I can try to configure or double check (ie settings on my pc, on OpenVPN etc.)? — OrangeJusticeV, Apr 02 '21 at 14:08
@OrangeJusticeV: Even very primitive clients and servers still run in user space and and still read and write data after the initial TCP handshake happened. They are thus are subject to what I've described. High load on the server or low performance on the system might lead to scheduling problems, i.e. the time needed between a kernel event (connection established, data read) and the user app acting on this event in user space. If you would do a packet capture of the activity (i.e. wireshark, tcpdump, ...) you might see the timings more clear. — Steffen Ullrich, Apr 02 '21 at 15:40

Why is there a discrepancy between python sockets and tcp ping for the same IP:port destination?

1 Answers1