1

I use a Raspberry Pi with the PiCAN board which uses a MCP2515 CAN controller.
I use SocketCAN to read and write CAN messages via an application I wrote.
After running a few weeks without a problem the controller is now in the state "STOPPED". What is the difference between the state STOPPED and BUS-OFF?
Does a device enter the BUS-OFF state if too many error occure on the CAN bus and the device enters the STOPPED state if you set the device down (ip link set canX down)?
Are there any other ways how the device may enter the state STOPPED? I wasn't able to find a way how my application might have set the device down.

ip -details -statistics link show can0
3: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
   link/can  promiscuity 0
   can state STOPPED restart-ms 100
      bitrate 250000 sample-point 0.875
      tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
      mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
      clock 8000000
      re-started bus-errors arbit-lost error-warn error-pass bus-off
      0          0          0          146        139        0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
   RX: bytes  packets  errors  dropped overrun mcast
   787700920  151606570 24      0       24      0
   TX: bytes  packets  errors  dropped carrier collsns
   6002905    5895301  0       0       0       0 
Timonysos
  • 21
  • 2

2 Answers2

0

I had the problem of STOPPED in our application running on a NXP iMX6. This was happening when starting many applications, all settings the CAN fresh. Changing settings is a sequence of interface down, changing, interface up. In the CAN kernel modules this ends up as a lot of close and open calls. And at one time the open started to fail constantly, reporting that STOPPED state in canconfig or ip link. Only a reboot helped then.

The NXP iMX6 uses the kernel's flexcan module and in the kernel we use (toradex_4.14-2.3.x-imx) that module has an issue with calling pm_runtime_get_sync(): it considers it failing even when the return value is positive. Other modules only interprete negative values that way.

So I've changed the flexcan module and so far it works. I'm citing my patch below. Even if the Raspberry Pi uses another module or kernel version you could check whether the same issue applies.

Good luck, Solon

From 1142dfb7b9f26e5882724689e2d110d09479714e Mon Sep 17 00:00:00 2001
From: Solon
Date: Mon, 28 Aug 2023 08:11:56 +0200
Subject: [PATCH] can: flexcan: handle pm_runtime_get_sync() errors

Modelled after other can modules.  Only negative return values from
pm_runtime_get_sync() are considered errors and lead to a call to
pm_runtime_put_noidle() to sync the power management for the next call.
---
 drivers/net/can/flexcan.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 550ae1f8b318..a8a40ef59c70 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -1479,8 +1479,11 @@ static int flexcan_open(struct net_device *dev)
    int err;
 
    err = pm_runtime_get_sync(priv->dev);
-   if (err)
+   if (err < 0) {
+       pm_runtime_put_noidle(priv->dev);
+       dev_err(priv->dev, "%s: pm_runtime_get_sync failed(%d)\n", __func__, err);
        return err;
+   }
 
    err = open_candev(dev);
    if (err)
@@ -1855,7 +1858,8 @@ static int flexcan_probe(struct platform_device *pdev)
    pm_runtime_enable(&pdev->dev);
    err = pm_runtime_get_sync(&pdev->dev);
    if (err < 0) {
-       dev_err(&pdev->dev, "pm_runtime_get failed(%d)\n", err);
+       pm_runtime_put_noidle(&pdev->dev);
+       dev_err(&pdev->dev, "%s: pm_runtime_get_sync failed(%d)\n", __func__, err);
        goto failed_rpm_disable;
    }
 
-- 
2.30.2
Solon
  • 1
  • 2
-2

You need to familiarize your self with ERROR ACTIVE, ERROR PASSIVE, and BUS OFF error states of CAN bus devices, and when is it needed to manually restart CAN communication.

All relevant info can be found at one of these links:

http://www.can-wiki.info/doku.php?id=can_faq:can_faq_erors

http://www.port.de/cgi-bin/CAN/CanFaqErrors

avra
  • 3,690
  • 19
  • 19
  • SO recommends pasting the content into your answer, don't provide links (which can go dead). Also, the links you provided don't explain why STOPPED and DOWN are shown (assuming the interface was commanded up) – TSG Nov 13 '22 at 19:12
  • If anyone needs info about SocketCAN bus errors then he can compile and use my hlcanerrdump utility or study it's source here: https://forum.lazarus.freepascal.org/index.php/topic,39858.msg402234.html#msg402234. – avra Nov 14 '22 at 22:38
  • You can also find there my hlcanerrsim utility which can simulate SocketCAN errors. When you combine these 2 tools you can better understand CAN error messaging. Here is the link: https://forum.lazarus.freepascal.org/index.php/topic,39858.msg403874.html#msg403874 – avra Nov 14 '22 at 22:49