Linux SocketCAN behaviour of recvmsg

Question

I'm writing a CAN logger program. The way I log the data is similar to the way the candump-tool is doing it when invoking candump like candump any: https://github.com/linux-can/can-utils/blob/master/candump.c

candump any makes candump bind to any device, i.e. addr.can_ifindex = 0; then it uses recvmsg to obtain a CAN frame, then it gets the on the struct msghdr msg; attached timestamp to write it into the log file or onto the screen.

My question here is, does the kernel ensures that the following assert is always valid?

struct msghdr msg;
// init stuff
// ...
s[0] = _skt_1; // can0
s[1] = _skt_2; // can1
// configure and bind sockets
// ...
select(s[1]+1, &rdfs, NULL, NULL, NULL));
recvmsg(s[0], &msg, 0); // https://linux.die.net/man/2/recvmsg
timestamp_1 = getTimestamp(msg);
recvmsg(s[1], &msg, 0); // https://linux.die.net/man/2/recvmsg
timestamp_2 = getTimestamp(msg);
// Always valid?
assert(timestamp_1 < timestamp_2);

A hint to the source code location in the SocketCAN driver would be helpful too.

Don't use `<` and `>` between timestamps. To correctly handle rollover, you must subtract two timestamps (using unsigned arithmetic) and then you can compare the difference to a threshold. — Ben Voigt, Oct 22 '20 at 18:46
I don't understand what problem appears, when comparing timestamps that way? — JulianW, Oct 22 '20 at 21:15
`<` thinks that 20 January 2038 comes before 18 January 2038. — Ben Voigt, Oct 22 '20 at 21:50
You know, that we are talking about microseconds? Linux is taking a Unix timestamp while in CAN interrupt with µs precision. — JulianW, Oct 22 '20 at 22:10
With microseconds, the overflow happens a million times as often. You won't have to wait until 2038 to experience the bug. Don't treat timestamps as absolute values, treat them as relative. — Ben Voigt, Oct 22 '20 at 22:15
I really don't get what you try to say. Which overflow? An Unix timestamp is already relative. But perhaps, you can give a short example? — JulianW, Oct 22 '20 at 22:50
read this: https://www.kernel.org/doc/Documentation/networking/timestamping.txt but generally no. Timestamps could be provided by the hardware which could - for whatever reason - give you random numbers. — sneusse, Oct 23 '20 at 12:52
@sneusse I couldn't find anything satisfactory in the `timestamping` documentation. The nearest I could find is "(not necessarily monotonic)". But imo. this only means, the stack is not sorted after a new frame with an older timestamp is received (this can happen when NTP kicks in while logging). My question rather aims at what happens if two CAN controllers Interrupts are handeled by different CPUs, and whether there is a difference when pinning both interrupts to the same CPU. And whether in one of the cases the assertion is right? — JulianW, Oct 23 '20 at 13:35
@JulianH: 1.3.1 Timestamp Generation states when the flag SOF_TIMESTAMPING_RX_HARDWARE is present, timestamps may be generated by the hardware. If your hardware could e.g. only supply 32bit timestamps or generates individual timestamps by channel or whatever this wouldn't work. This could be handled in the device driver or not. Maybe I didn't understand the question totally, could you elaborate why you care about this a little bit more? — sneusse, Oct 23 '20 at 17:23

score 2 · Accepted Answer · answered Oct 23 '20 at 18:14

The short answer is yes, unless your driver does something very weird. CAN uses the same netif subsystem that other network devices use. There are a few ways that the SKB gets a timestamp.

HW Timestamps:

If your driver uses hardware time stamps, then time stamps are based on whatever the hardware provides.

SW Timestamps:

If netdev_tstamp_prequeue is enabled then a timestamp is soon after your driver submits the skb to netif_receive_skb

https://elixir.bootlin.com/linux/v4.14.202/source/net/core/dev.c#L4554

If netdev_tstamp_prequeue is not enabled then the timestamp is applied after a bit more processing but still in the same NAPI receive thread.

https://elixir.bootlin.com/linux/v4.14.202/source/net/core/dev.c#L4352

Here is the fuzzy part:

There are special modes (RSP/RFP) that allow the kernel to load balance skb processing with SMP. Instead of processing the skb in the napi receive thread, the kernel puts the skb in a per cpu queue. Now if netdev_tstamp_prequeue is not enabled, the timestamp is added when it comes off the per cpu queue some time later. However, the documentation says the receive ordering is not modified so time stamps should remain in order as well.

Thats dope, thanks men. But one more thing I could not understand yet: Is the kernel ensureing that preemting does not destory the correct order of the frames observed by `recvmsg`, when interrupts are handled by different CPUs (e.g. can0 is pinned to CPU0 and can1 is pinned to CPU1)? "kernel puts the skb in a per cpu queue" imo. implies, pinning the interrupts to different CPUs could lead (due to preemting) to a wrong order when pulling the frames from those 2 stacks? So I guess my best bet would be to pin both interrupts to the same CPU? I guess, in the end I have to test anyway... — JulianW, Oct 23 '20 at 20:55

score 0 · Answer 2 · answered Nov 03 '20 at 14:22

I want add something to the answer of @user14508498.

I finally made some measurements. If I pin the interrupts to different CPUs (e.g. CAN0 to CPU0 and CAN1 to CPU1), candump will indeed receive some CAN frames in non-chronical order, i.e. the aboves assertion is not always true in this specific case. The magnitude of order at least on my system is around 1-2 microseconds. I could not observe the same when both interrupts are pinned to the same processor.

Linux SocketCAN behaviour of recvmsg

2 Answers2