2

I want to ask about supporting Lock-step(lockstep, lock-step) processors in SW-level.

As I know, in AUTOSAR-ASILD, Lock-step processor is used for fault torelant system as below scenario.

  1. The input signals for a processor is copied to another processor(its Lock-step pair).

  2. The output signals from two different processors are compared.

  3. If two output signals are different, trap is generated.

I think that if there is generated trap, then this generated trap should be processed somewhere in SW-level. However, I could not find any standard for this processing. I have read some error handling in SW topics specified in AUTOSAR, but I could not find any satisfying answers.

So, my question is summarized as below.

  • In AUTOSAR or other standard, where is the right place that processes Lock-step trap(SW-C or RTE or BSW)?.
  • In AUTOSAR or other standard, what is the right action that processes Lock-step trap(RESET or ABORT)?

Thank you.

1 Answers1

1

There are multiple concepts involved here, from different sources.

The ASIL levels are defined by ISO 26262. ASIL-D is the highest level and using a lockstep CPU is one of the methods typically used to achieve ASIL-D compliance for the whole system. Autosar doesn't define how you achieve ASIL-D, or any ASIL level at all. From an Autosar perspective, lockstep would be an implementation detail of the MCU driver, and Autosar doesn't require MCUs to support lockstep. How a particular lockstep implementation works (whether the outputs are compared after each instruction or not, etc.) depends on the hardware, so you can find those answers in the corresponding hardware manual.

Correspondingly, some decisions have to be made by people working on the system, including an expert on functional safety. The decision on what to do on lockstep failure is one such decision - how you react to a lockstep trap should be defined at the system level. This is also not defined by Autosar, although the most reasonable option is to reset your microcontroller after saving some information about the error.

As for where in the Autosar stack the trap should be handled, this is also an implementation decision, although the reasonable choice is for this to happen at the MCAL level - to the extent that talking about levels even makes sense here, as the trap will run in interrupt/trap context and not the normal OS task context. Typically, a trap would come with a higher priority than any interrupt, and also typically it's not possible to disable the traps in software. A trap will be handled by some routine that is registered by the OS in the same way it registers ISRs, so you'd want to configure the trap handler in whatever tool you're using for OS configuration. The lockstep trap may (again, depending on the hardware) be considered a non-recoverable trap, meaning that the trap handler should trigger a reset eventually. Calling the standard ShutdownOS() function may be reasonable.

DUman
  • 2,560
  • 13
  • 16
  • Agreeing mostly, but I always wonder, why almost anyone always thinks first on reset as functional safety recovery method. The first would be to throw away the data and try to recover from old data if possible. Especially for bigger processors with mutliple cores and external flash, the recovery from a reset can take ages. Especially now, if security and secure boot is involved. In that time, your ECU is not communicating, not even a failure, so other ECUs might just run into timeouts, especially in ASIL-B, where no redundancy is required. – kesselhaus Sep 15 '18 at 07:58
  • @kesselhaus You don't reset on just getting an invalid checksum message, you reset when you detect a software/hardware fault that shouldn't occur in the first place. Then a reset is the safest option because it returns you to a known, very well-tested state, avoiding whatever side effects the one-in-ten-million glitch might have caused. Other ECUs receiving timeouts is absolutely not a safety problem, any ECU should have fallback behavior if it's not receiving some data that it expects. – DUman Sep 15 '18 at 13:20