0

I am trying to make two of NVIDIA’s Jetson AGXs communicate via ethernet with as low latency as possible using UDP protocol. The default request-response latency measured by netperf is around 200 microseconds. I am looking for ways to reduce this and all suggestions are welcome.

Looking at the network stack, I came across the fact that Jetson uses little endian byte order while network uses big endian order. So, for a request response scenario, byte order conversion needs to be done 4 times

–>start
Host sender (LE)–> BE → send to client
Client receiver BE → LE
Client sender LE–> BE -->send to host
Host receiver BE → LE
→ end

Of course this is a very simplified picture and I have omitted all parts of the stack unrelated to byte ordering. My question is, does this 4 time conversion impact the latency significantly? If one were to use systems with big endian ordering, given everything else remains same, would that reduce network latency by any measurable amount?

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 3
    Most data sent over the network is in bytes, not larger sizes that need to be reordered. It generally only impacts a few fields in the packet headers. – Barmar Dec 24 '21 at 03:29
  • whar Barmar said. Data within packet is sent byte by byte. Network order there is for header fields. It is customary to send data in network order too, which would involve swapping before send and after recieving. That's done for portability. but your communication protocol may agree on sending raw data in native order if you never plan for endianness to change. Also, for low latency, you may look into differences between udp and tcp. – Swift - Friday Pie Dec 24 '21 at 04:00

1 Answers1

0

In the most simplisitic terms, the end-to-end latency seen by a netperf TCP_RR test is: TimeInNetperfSendRequest + TimeInStackToSend + TimeInNICToSend + TimeOnNetwork + TimeInNICToReceive + TimeInstackToReceive + TimeInNetserverRecvRequest and then all that in reverse to send the response.

The CPU time to send or receive a packet is largely (not entirely, but largely) all in the latency path. So, if you include CPU utilization measurements in your netperf tests, you can see what the microseconds of CPU time per transaction happen to be. That will cover all the stuff listed above except the NIC and Network bits. You can then subtract that from the overall transaction latency to see just how much of the latency you are seeing is from packet processing, and how much is from "the NICs and network" and go from there to optimize.

Side note: While netperf will do endian normalization for what it passes across the control connection, it does not do so for the data on the data connection.

Rick Jones
  • 166
  • 4