3

I am trying to measure latency when a packet comes in Rx buffer and is copied to application memory. I am measuring it with this code:

struct timespec start, end;

clock_gettime(CLOCK_REALTIME, &start);
recvfrom(sock, msg, msg_len, 0, &client, &client_addrlen);
clock_gettime(CLOCK_REALTIME, &end);

I know this cannot precisely measure the latency. However, I can calculate average latency by receiving many packets, measuring each one, and calculating them. Is there any method to measure latency more precisely? (e.g., latency = (time when recvfrom() is done) - (time when NIC receives a packet from))

For a device and device driver, I am using Mellanox connectx-3 and mlx4_en.

Won
  • 65
  • 7
  • This code measures the latency of copying data from kernel-owned memory into application owned-memory. It has nothing to do with general packet latency. – SergeyA Nov 08 '21 at 18:57
  • The formula `latency = (time when recvfrom() is done) - (time when NIC receives a packet from)` is wrong, it calculates the time it takes for the kernel to acknowledge the packet was received, copy it into its own memory (probbaly done by the NIC with DMA) and then copy it to userspace. – Marco Bonelli Nov 08 '21 at 19:01
  • Oh, I think I made you guys misunderstand. I am not measuring RTT between two hosts. I am measuring the latency exactly what you guys mentioned. I am only interested in the receiver part. So, that's why I set a starting time as when HW receives a packet (if it is possible to measure) or kernel acknowledges a packet arrival. And end time would be the completion moment when the packet is copied to userspace. I don't think I am using the wrong formula? – Won Nov 08 '21 at 19:10
  • Since recvfrom() is a blocking function and I am measuring the start timestamp in userspace, I don't think I can capture the timestamp when a packet actually arrives at the device. I am just wondering whether I can precisely capture the packet arrival – Won Nov 08 '21 at 19:18
  • Perhaps a bit complicated, but you could use eBPF/XDP to measure the time. XDP hooks are triggered just after receiving a packet from the NIC/Driver but before it goes into the network stack. You would have to match the packet you get in userspace to the packet you got in the kernel, but it is doable. You might be able to do the same thing with kprobes. – Dylan Reimerink Nov 08 '21 at 19:22
  • Thanks! I will check and see what I can do for this job – Won Nov 08 '21 at 19:46
  • Review [Precision Time Protocol](https://en.wikipedia.org/wiki/Precision_Time_Protocol#Synchronization) for ideas. Have the sender send the time in which packet was sent and record its arrival. Do the reserve, send a time-stamped packet and ask the receiver to report what time received. With these 4 timestamps, packet latency can be deduced - as well as clock synchronization. – chux - Reinstate Monica Nov 08 '21 at 20:21
  • You need an adapter that can generate timestamps in HW. I think the Mellanox adapter can do that. See https://docs.mellanox.com/display/MLNXOFEDv451010/Time-Stamping – Support Ukraine Nov 08 '21 at 20:37
  • If you are actually measuring the time the `recvfrom( )` takes to execute when a packet has already been received by the stack (even though I don't exactly understand why) you could call [select( )](https://linux.die.net/man/2/select) to start measuring time when there's already data in descriptor `sock`. – Roberto Caboni Nov 08 '21 at 20:54

1 Answers1

3

I was able to get an almost precise number with recvmsg().

Reference

Code

I am reproducing the code from the first link. This code is not a ready-to-run, but just a snippet from the working code.

static struct timespec handle_time(struct msghdr *msg) {
    struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg);
    struct scm_timestamping *ts = (struct scm_timestamping *)CMSG_DATA(cmsg);
    return ts->ts[0];
}

...
char ctrl[64];
char *msg = malloc(64);

int val = SOF_TIMESTAMPING_RX_HARDWARE | SOF_TIMESTAMPING_RX_SOFTWARE
        | SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_RAW_HARDWARE;

setsockopt(sock_fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));

// user buffer
struct iovec iov = {
    .iov_base = msg,
    .iov_len = msg_len,
};

// ancillary message header 
struct msghdr m = {
    .msg_name = &client_addr,           // struct sockaddr_in
    .msg_namelen = client_addrlen,      // socklen_t
    .msg_iov = &iov,
    .msg_iovlen = 1,
    .msg_control = &ctrl,
    .msg_controllen = sizeof(ctrl),
};

while (1) {
    memset(msg, 0, msg_len);
    num_received = recvmsg(sock_fd, &m, 0);
    start = handle_time(&m);
    clock_gettime(CLOCK_REALTIME, &end);

    if (verbose) {
        double elapsed_time = time_diff(start, end) / 1000;
        total_elapsed += elapsed_time;
        count++;
        printf("%f us %f us\n", elapsed_time, total_elapsed / count);
    }

    if (sendto(sock_fd, msg, msg_len, 0, (struct sockaddr *) &client_addr, client_addrlen) < 0) {
        perror("\nMessage Send Failed\n");
        fprintf(stderr, "Value of errno: %d\n", errno);
    }
}

The key point is to use setsockopt() and recvmsg(). The key mechanism is when you set an option for a certain socket FD, the kernel will set a timestamp based on timestamp flag. After you set them, if you receive a message with struct msghdr, the kernel will audit the timestamp in a way of SW or HW. When you look into the data, you would be able to get 3 timestamps. These information can be explained below:

The structure can return up to three timestamps. This is a legacy feature. At least one field is non-zero at any time. Most timestamps are passed in ts[0]. Hardware timestamps are passed in ts[2]. ts[1] used to hold hardware timestamps converted to system time. Instead, expose the hardware clock device on the NIC directly as a HW PTP clock source, to allow time conversion in userspace and optionally synchronize system time with a userspace PTP stack such as linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst.

See 2.1 from Documentation/networking/timestamping.txt for detail.

If you want to see HW timestamp then you need to have a specific HW (refer this comment) and turn its feature with ioctl(). However, there is a convenient tool called linuxptp, which does this job.

Won
  • 65
  • 7