2

Setup

I have two nodes connected to one CAN bus. The first node is a black-box, controlled by some real-time hardware. The second node is a Linux machine with attached PEAK-USB CAN controller:

+--------+               +----------+
| HW CAN |--- CAN BUS ---| Linux PC |
+--------+               +----------+

In order to investigate some problem related to occasional frame loss I want to mimic the CAN arbitration process. To do that I am setting the CAN bit-rate to 125Kb/s and flooding it with random CAN frames with 1ms delay, controlling the bus load with canbusload from can-utils. I also monitor CAN error frames running candump can0,0~0,#ffffffff and the overall can statistics with ip -s -d link show can:

26: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state ERROR-ACTIVE restart-ms 0
          bitrate 125000 sample-point 0.875
          tq 500 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
          pcan_usb: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
          clock 8000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    120880     15110    0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    234123     123412   0       0       0       0

Problem

Now the problem is that the given setup works for hours with zero collisions (arbitration) or any other kind of error frames when the load is at 99%. When I reduce the delay to increase the bus load write(2) fails with either "ENOBUFS 105 No buffer space available" or "EAGAIN 11 Resource temporarily unavailable" - the actual error depends on whether I modify the qlen parameter or set to to defaults.

As I understand it, the load I put is either not enough or too much. What would be the right way to make two nodes enter the arbitration? A successful result would be a received CAN error frame corresponding to the CAN_ERR_LOSTARB constant from can/error.h and a value of collsns other than 0.

Source code

HW Node (Arduino Due with CAN board)

#include <due_can.h>

CAN_FRAME input, output;

// the setup function runs once when you press reset or power the board
void setup() {
  Serial.begin(9600);
  Serial.println("start");

//  Can0.begin(CAN_BPS_10K);
  Can0.begin(CAN_BPS_125K);
//  Can0.begin(CAN_BPS_250K);


  output.id = 0x303;
  output.length = 8;
  output.data.low = 0x12abcdef;
  output.data.high = 0x24abcdef;
}

// the loop function runs over and over again forever
void loop() {    
    Can0.sendFrame(output);
    Can0.read(input);

    delay(1);
}

Linux node

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#include <net/if.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#include <linux/can.h>
#include <linux/can/raw.h>

int main(int argc, char *argv[])
{
  int s;
  int nbytes;
  struct sockaddr_can addr;
  struct can_frame frame;
  struct ifreq ifr;

  const char *ifname = "can0";

  if((s = socket(PF_CAN, SOCK_RAW, CAN_RAW)) < 0) {
    perror("Error while opening socket");
    return -1;
  }

  strcpy(ifr.ifr_name, ifname);
  ioctl(s, SIOCGIFINDEX, &ifr);

  addr.can_family  = AF_CAN;
  addr.can_ifindex = ifr.ifr_ifindex;

  printf("%s at index %d\n", ifname, ifr.ifr_ifindex);

  if(bind(s, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
    perror("Error in socket bind");
    return -2;
  }

  frame.can_id  = 0x304;
  frame.can_dlc = 2;
  frame.data[0] = 0x11;
  frame.data[1] = 0x22;

  int sleep_ms = atoi(argv[1]) * 1000;

  for (;;) {
    nbytes = write(s, &frame, sizeof(struct can_frame));
    if (nbytes == -1) {
      perror("write");
      return 1;
    }
    usleep(sleep_ms);
  }
  return 0;
}
Alexander Solovets
  • 2,447
  • 15
  • 22
  • Could you give us the code that generate the random CAN frames and the ID your black box node uses ? Normally at 99% of bus load you should have errors, I suspect that only your USB device is transmitting something and your black box is never able to send. – Benoît Jul 18 '19 at 08:02
  • @Benoît I posted the source codes of both nodes. Also, `ip show` clearly shows that there are transmitted and received packets. – Alexander Solovets Jul 18 '19 at 08:39
  • Try the following : `can_err_mask_t err_mask = CAN_ERR_MASK;` `setsockopt(socket_can, SOL_CAN_RAW, CAN_RAW_ERR_FILTER, &err_mask, sizeof(err_mask));` (before the bind) – Benoît Jul 18 '19 at 11:41
  • @Benoît No luck – Alexander Solovets Jul 18 '19 at 12:06
  • 1
    From the [documentation] (https://www.kernel.org/doc/Documentation/networking/can.txt) subsection 4.1.2 RAW socket option CAN_RAW_ERR_FILTER, it says that the errors are by default not activated which is why the LOSTARB field in `ip` was not increasing. The 2 lines I gave you should have enabled those errors but it's possible that the driver / firmware isn't compatible. I have read on a [PEAK forum](https://www.peak-system.com/forum/viewtopic.php?f=41&t=2825) that if the firmware from your USB device has a version < 4.x there is no "selfreceive feature" thus you won't be able to detect LOSTARB – Benoît Jul 18 '19 at 14:30
  • 2
    "To do that I am setting the CAN bit-rate to 125Kb/s and flooding it with random CAN frames with 1ms delay" That's a very weak test. CAN is designed to work with 100% bus load - the arbitration "CSMA/CA" means you get collision avoidance even at 100% load. As for spurious arbitration errors, there are two things I can think of: the most obvious being the notorious missing signal ground. A rarer case is when two nodes are sending same identifier (arbitration field, more precisely) but with different payload, at the same time. I know nothing of the Linux fluff though - it could be the problem. – Lundin Jul 19 '19 at 12:51

1 Answers1

2

From the documentation subsection 4.1.2 RAW socket option CAN_RAW_ERR_FILTER, it says that the errors are by default not activated which is why the lost arbitration field in ip was not increasing.

In order to toggle on all the errors, you need to add those two lines :

can_err_mask_t err_mask = CAN_ERR_MASK;
setsockopt(socket_can, SOL_CAN_RAW, CAN_RAW_ERR_FILTER, &err_mask, sizeof(err_mask));

But this feature is not available for all drivers and devices because it requires from the hardware to have a loopback mode. In the case of the PEAK-USB, it seems that if the version of the firmware from the device is less than 4.x, there is no loopback [source]. Thus SocketCAN won't be able to detect lost arbitration.

Benoît
  • 406
  • 3
  • 11