Safely Exiting to a Particular State in Case of Error

Question

When writing code I often have checks to see if errors occurred. An example would be:

char *x = malloc( some_bytes ); 
if( x == NULL ){
    fprintf( stderr, "Malloc failed.\n" ); 
    exit(EXIT_FAILURE); 
}

I've also used strerror( errno ) in the past.

I've only ever written small desktop appications where it doesn't matter if the program exit()ed in case of an error.

Now, however, I'm writing C code for an embedded system (Arduino) and I don't want the system to just exit in case of an error. I want it to go to a particular state/function where it can power down systems, send error reports and idle safely.

I could simply call an error_handler() function, but I could be deep in the stack and very low on memory, leaving error_handler() inoperable.

Instead, I'd like execution to effectively collapse the stack, free up a bunch of memory and start sorting out powering down and error reporting. There is a serious fire risk if the system doesn't power down safely.

Is there a standard way that safe error handling is implemented in low memory embedded systems?

EDIT 1: I'll limit my use of malloc() in embedded systems. In this particular case, the errors would occur when reading a file, if the file was not of the correct format.

What about keeping global `char success = 0;` and `if(success == 1) break;`? You could use `longjmp`, but that's ugly. — , Sep 12 '15 at 09:35
@Xis88: That's *just a little ugly*, but hey! He's in the embedded world! — 3442, Sep 12 '15 at 09:35
@KemyLand: There is no actual need in the embedded world for `setjmp` either. In fact, it is often a signal of bad system architecture. But as OP uses `malloc`, he likely has worse problems. — too honest for this site, Sep 12 '15 at 14:46
@Olaf I've never had experience programming C for embedded systems. Is there a resource you would recommend for an introduction to the do's and don'ts? — Rohan, Sep 12 '15 at 15:36
@Rohan: Sorry. I never read a book or so, but learned it from the very basics (mostly self-taught, more structured at university) over decades. The latter might be a good way, but only if it is a proper university (there are too many trash courses). — too honest for this site, Sep 12 '15 at 15:50
@Olaf, I fully agree about the 'trash courses'. Especially at the community colleges. EVERY programming course I have taken at a community college has consumed lots of my time and money, but not taught me how to actually write a programming language project from a blank sheet of paper. (I have attended several different community colleges) the courses are spoon fed pabulum that leave out the majority of the language. I have even attended C++ courses, that required C first. However the students did not know any C looping statements nor pointers, nor .... — user3629249, Sep 13 '15 at 16:16
in embedded systems, only malloc during initialization, and never invoke any of the malloc/free functions thereafter. I have found, through some 40 years of programming embedded systems, that if the coder thinks that a malloc/free sequence needs to be done at some mid point in the code, then the software architecture needs some serious re-design. — user3629249, Sep 13 '15 at 16:28
Since you are using Arduino, malloc and stdio, I take it you are not actually designing safety-critical systems. Maybe remove that tag since it is misleading. — Lundin, Sep 14 '15 at 06:22
@user3629249 Why use malloc at all? You shouldn't use malloc on bare bone microcontrollers, simply because _[it doesn't make any sense](http://electronics.stackexchange.com/questions/171257/realloc-wasting-lots-of-space-in-my-mcu/171581#171581)_. — Lundin, Sep 14 '15 at 06:23
@Lundin: Heh, didn't knew that tag existed... There's surely a reason by which it is used in just **19** questions over all SO. — 3442, Sep 14 '15 at 06:23
So to sum it up... you _are_ doing safety-critical applications... With Ardunio. And malloc. And stdio. And now think adding setjmp/longjmp on top of that is a splendid idea. There is no polite way for me to tell you what you should do with this project. Just know that in court, you'll end up in jail. — Lundin, Sep 14 '15 at 06:53
@Lundin I'm a student working on a final year project which is effectively a Thermal Aging Oven prototype. There are physical circuit breakers if the oven get's too hot, but the code is the primary controller. Does this make safety critical relevant? Your linked post is very useful, It never occurred to me that I might as well just use up all the memory and code for the worst case. I'll do that. As I asked Olaf, I haven't had much experience coding for embedded systems - could you recommend some learning resources? — Rohan, Sep 14 '15 at 07:59
@Rohan If there is a risk of fire, I'd say it is a safety-critical application. In which case the project manager needs to ensure that there's a risk assessment made prior to the specification. If this shows that the software will be controlling a safety function, you have to make all kinds of precautions, depending on what consequences a failure would have. There will be application-specific safety standards. Developing safety-critical software comes with _a lot_ of overhead, we're talking about something like 2 to 4 times more work, with all the formalities needed. — Lundin, Sep 14 '15 at 10:58
@Rohan - Lundin is more than right. Not only that you need an additional overhead for safety-relevant applications, you need some experienced professional(s) in your project team who know how to apply all the rules that apply to safety systems. Otherwise, you don't end up in 2- to 4-fold efforts, but the project dies with 5- to 10-fold efforts spent in vain. This applies for any kind of "machine" you deliver to other people (commercially or for free) - it doesn't have to be about a nuclear plant or a siege tank. A sewinig machine or your oven also fall into this category. — HelpingHand, Apr 16 '20 at 09:24
The important point with this oven is that a electric/electronic/programmable-electronic ("E/E/PE") system is responsible to avoid hazards. If you implement a "dumb" oven with a simple on/off switch, safety is easier to achieve. This also applies if your arduino only displays infotainment ("Now put some muscat to the potatoes...") - or if you use a monitor/circuit breaker component that has been qualified for that safety purpose, and that can never be overridden by the embedded system. This component acts like a "safety rope" which only holds if the "unsafe" system leaves its allowable domain. — HelpingHand, Apr 16 '20 at 09:44
@HelpingHand Appreciate you commenting on a question from 5 years ago! A lot has changes since then, this project was back in undergrad uni; and I'm pretty sure it's since been scrapped for parts. We did end up putting a temperature fuse in the oven as an additional layer of protection. Since then I moved into the Oil and Gas idustry and have learnt alot about SIL, SIF, IEC61511 and what safety critical actually means! — Rohan, Apr 17 '20 at 15:06
This sounds great, congratulations! Then you have mastered the topic that you once had to post a question about. I suggest you add an answer here, which collects from your present expertise, how you should have tackled the problem best. In any case - have a nice and successful way on your present, exciting field! — HelpingHand, Apr 17 '20 at 15:12

3442 · Accepted Answer · 2015-09-12T09:48:54.830

1

Maybe you're waiting for the Holy and Sacred setjmp/longjmp, the one who came to save all the memory-hungry stacks of their sins?

#include <setjmp.h>

jmp_buf jumpToMeOnAnError;
void someUpperFunctionOnTheStack() {
    if(setjmp(jumpToMeOnAnError) != 0) {
        // Error handling code goes here

        // Return, abort(), while(1) {}, or whatever here...
    }

    // Do routinary stuff
}

void someLowerFunctionOnTheStack() {
    if(theWorldIsOver)
       longjmp(jumpToMeOnAnError, -1);
}

Edit: Prefer not to do malloc()/free()s on embedded systems, for the same reasons you said. It's simply unhandable. Unless you use a lot of return codes/setjmp()s to free the memory all the way up the stack...

edited Sep 12 '15 at 09:48

answered Sep 12 '15 at 09:38

3442

8,248
2
19
41

Is this similar to a `goto` statement? – Rohan Sep 12 '15 at 09:41
1

@Rohan: More or less... `setjmp()`/`longjmp()` allows you to do something like "a goto across the stack", that is, you can return to a function indefinitely up on the stack, in the same `goto` style. The standards mandate that `setjmp()` returns zero if this is the first time the function is called. Otherwise, that is, if the function returned from a `longjmp()`, the return value is non-zero. – 3442 Sep 12 '15 at 09:44
2

using set jump in an embedded system is just a precursor to spaghetti code that is not maintainable. – user3629249 Sep 13 '15 at 16:23
setjmp/longjmp are, just like malloc, incredibly banned from everything that is safety-critical systems... – Lundin Sep 14 '15 at 06:20
@Lundin: And, what's the rationale behind that? With `malloc()`, that's understandable, but what's the problem with `setjmp()`/`longjmp()`? – 3442 Sep 14 '15 at 06:21
@KemyLand To quote the MISRA-C rationale: "setjmp and longjmp allow the normal function call mechanisms to be bypassed. Their use may lead to undefined and unspecified behavior." Personally, I don't believe they should be used just because using them is like admitting that your program design is really bad and you have to resort to questionable ad hoc solutions just to patch it together. Better then to fix the program design instead. – Lundin Sep 14 '15 at 06:33
@Lundin: Again, I see *subjective* points, but not *objective* ones. – 3442 Sep 14 '15 at 07:39
@KemyLand What's subjective with setjmp/longjmp increasing the chances for fatal program crashes? It is like saying that banning the use of function-like macros or goto spaghetti code is subjective, because if you use those features correctly, they won't cause problems. Or if you will "the program will work fine if you don't write any bugs". The fact is that _everyone_ writes bugs, you can look at any program with software metrics and determine the quality based on how many bugs there are per LOC etc, but no moderately-sized program is free of them. It is all about risk reduction. – Lundin Sep 14 '15 at 07:45
@Lundin: I still don't catch the point on how those *useful and purposeful tools* can "create more bugs" than that ficticious solution to the problem *for embedded systems* that you haven't gaven to the moment... The only one of those I won't defend are function-like macros. – 3442 Sep 14 '15 at 07:57
Again, because of the different forms of poorly-specified behavior associated with the functions. See C11 Annex J p565, the text is too long to post here. – Lundin Sep 14 '15 at 11:34
@Lundin: **Edit** Comment "deleted". – 3442 Sep 14 '15 at 16:41
@Lundin: Well, the standard, in Appendix J, says nothing other than it's implementation-defined "Whether `setjmp` is a macro or an identifier with external linkage." – 3442 Sep 14 '15 at 16:48

sergej · Answer 2 · 2015-09-12T09:51:19.177

1

If your system has a watchdog, you could use:

char *x = malloc( some_bytes ); 
assert(x != NULL);

The implementation of assert() could be something like:

#define assert (condition) \
    if (!(condition)) while(true)

In case of a failure the watchdog would trigger, the system would make a reset. At restart the system would check the reset reason, if the reset reason was "watchdog reset", the system would goto a safe state.

update

Before entering the while loop, assert cold also output a error message, print the stack trace or save some data in non volatile memory.

edited Sep 12 '15 at 09:51

answered Sep 12 '15 at 09:43

sergej

17,147
6
52
89

1

But all the data is lost in the process... If not handled correctly, this can lead to very dirty conditions... – 3442 Sep 12 '15 at 09:45
Well, this works as long as the system supports NVRAM, watchdogs, et all. – 3442 Sep 12 '15 at 09:57
1

@KemyLand, an embedded system, usually, does not have a human watching it, so it will have configuration data, long term battery backed up (or NVM or FLASH) storage, just so it can recover across power cycles, watchdog events, etc. An embedded system will have the concept of warm and cold boots implemented, just so it can properly (and quickly) recover from reset events of any kind without losing critical data. – user3629249 Sep 13 '15 at 16:37
@user3629249: Well, that's a good point, but I don't think *all* embedded systems have such (although necessary) systems. For example, think of handheld consoles. **P.D**: I'm not making any assertion here, just curious if this is a *de-facto* standard across the embedded industry. I don't now if, for example, the Nintendo Gameboy had any watchdog timer. – 3442 Sep 13 '15 at 22:01
1

@KemyLand, I wrote the device driver and first two communication protocols for the gameboy RF communication. Because the OS was running, I did not need to worry about the watchdog, however; I expect that there was a watchdog in operation. I have been writing embedded code for nearly 40 years, There was ALWAYS a watchdog and a plan implemented for when the watchdog event occurs – user3629249 Sep 17 '15 at 19:45

score 1 · Answer 3 · edited Apr 13 '17 at 12:32

Is there a standard way that safe error handling is implemented in low memory embedded systems?

Yes, there is an industry de facto way of handling it. It is all rather simple:

For every module in your program you need to have a result type, such as a custom enum, which describes every possible thing that could go wrong with the functions inside that module.
You document every function properly, stating what codes it will return upon error and what code it will return upon success.
You leave all error handling to the caller.
If the caller is another module, it too passes on the error to its own caller. Possibly renames the error into something more suitable, where applicable.
The error handling mechanism is located in main(), at the bottom of the call stack.

This works well together with classic state machines. A typical main would be:

void main (void)
{
  for(;;)
  {
    serve_watchdog();

    result = state_machine();

    if(result != good)
    {
      error_handler(result);
    }
  }
}

You should not use malloc in bare bone or RTOS microcontroller applications, not so much because of safety reasons, but simple because it doesn't make any sense whatsoever to use it. Apply common sense when programming.

score 0 · Answer 4 · edited Apr 13 '17 at 12:50

0

Use setjmp(3) to set a recovery point, and longjmp(3) to jump to it, restoring the stack to what it was at the setjmp point. It wont free malloced memory.

Generally, it is not a good idea to use malloc/free in an embedded program if it can be avoided. For example, a static array may be adequate, or even using alloca() is marginally better.

edited Apr 13 '17 at 12:50

Community

1
1

answered Sep 12 '15 at 09:37

meuh

11,500
2
29
45

"Or even using `alloca()` is marginally better." - meuh. That's my quote of the day :). – 3442 Sep 12 '15 at 09:46
1

I dont know if arduino C allows dynamic array size declarations, which can replace `alloca()`, but stack overflow is best avoided by doing nothing dynamically, and not recursing, and so on. – meuh Sep 12 '15 at 09:53
The OP indicated the stack is already over committed, so `alloca()` would be a bad choice – user3629249 Sep 13 '15 at 16:40

score 0 · Answer 5 · answered Sep 13 '15 at 17:23

to minimize stack usage:

write the program so the calls are in parallel rather than function calls sub function that calls sub function that calls sub function.... I.E. top level function calls sub function where sub function promptly returns, with status info. top level function then calls next sub function... etc

The (bad for stack limited) nested method of program architecture:

top level function
    second level function
        third level function
            forth level function

should be avoided in embedded systems

the preferred method of program architecture for embedded systems is:

top level function (the reset event handler)
    (variations in the following depending on if 'warm' or 'cold' start)
    initialize hardware
    initialize peripherals
    initialize communication I/O
    initialize interrupts
    initialize status info
    enable interrupts
    enter background  processing

interrupt handler
    re-enable the interrupt
    using 'scheduler' 
        select a foreground function 
        trigger dispatch for selected foreground function        
    return from interrupt

background processing 

(this can be, and often is implemented as a 'state' machine rather than a loop)
    loop:
        if status info indicates need to call second level function 1 
            second level function 1, which updates status info
        if status info indicates need to call second level function 2
            second level function 2, which updates status info
        etc
    end loop:

Note that, as much as possible, there is no 'third level function x'

Note that, the foreground functions must complete before they are again scheduled.

Note: there are lots of other details that I have omitted in the above, like

kicking the watchdog, 
the other interrupt events,
'critical' code sections and use of mutex(),
considerations between 'soft real-time' and 'hard real-time',
context switching
continuous BIT, commanded BIT, and error handling 
etc

Safely Exiting to a Particular State in Case of Error

5 Answers5