C API design: what to do when malloc returns NULL?

Question

Let's say I'm writing a little library in C -- some data structure, say. What should I do if I'm unable to allocate memory?

It might be pretty important, e.g. I need some memory to initialize the data structure in the first place, or I'm inserting a key-value pair and want to wrap it in a little struct. It could also be less critical, for instance something like a pretty_print function that builds up a nice string representation of the contents. However, it's typically more serious than your average error -- there might not be a point in continuing at all. A ton of sample uses of malloc online just straight up exit the program if it returns NULL. I'm guessing a lot of real client code does that too -- just pop up some error, or write it to stderr, and abort. (And a lot of real code probably doesn't check the return value of malloc at all.)

Sometimes it makes sense to return NULL, but not always. Error codes (or just some boolean success value), either as return values or out parameters work fine, but it seems like they can clutter up or hurt the readability of the API (then again, maybe that's somewhat expected in a language like C?). Another option is to have some sort of internal error state the caller can subsequently query, e.g. with a get_error function, but then you have to be careful about thread safety, and it might be easy to miss; people tend to be lax about checking for errors anyway, and if it's a separate function altogether they might not know about it, or they might not bother (but then I guess that's their problem).

(I've sometimes seen malloc wrapped in a function that just tries again until memory is available...

void *my_malloc(size_t size)
{
    void *result = NULL;
    while (result == NULL)
        result = malloc(size);
    return result;
}

But that seems kind of silly and maybe dangerous.)

What's a proper way to handle this?

@R. whoopy, I totally skipped the first line when reading this question. In that case, return FALSE/NULL and add an error state function for details. — Till, Jan 29 '12 at 03:22
see [What is the correct way to handle “out of memory”?](http://stackoverflow.com/questions/1439977/what-is-the-correct-way-to-handle-out-of-memory) or [What are out-of-memory handling strategies in C programming?](http://stackoverflow.com/questions/3477652/what-are-out-of-memory-handling-strategies-in-c-programming) — jschmier, Jan 29 '12 at 06:37
The loop-until-memory-is-available may be a reasonable strategy in some cases (though much better to do it with sleep in the loop, to get out of the way so something else can free the memory!). Those cases are generally ones where you control everything running on the system and where you know that you're not going to get all your applications deadlocked in the same loop waiting for each other to free memory, though; certainly not in a standalone application. — Brooks Moses, Jan 29 '12 at 07:25

score 14 · Accepted Answer · answered Jan 29 '12 at 03:24

If allocation fails in a way that prevents forward progress, the only acceptable solution for library code is to back-out whatever allocations and other changes have already been made in the partially-completed operation and return a failure code to the caller. Only the calling application can know what the right way to proceed is. Some examples:

A music player might just abort or return to an initial/stopped state and wait for user input again.
A word processor might need to store an emergency dump of the current document state to a recovery file then abort.
A high-end database server might need to reject and back out the whole transaction and report to the client.

If you follow the oft-advised but backwards idea that your library should just abort the caller on allocations failures, you will either have many programs that determine they cannot use your library for this reason, your users' of the programs that use your library will be extremely angry when an allocation failure causes their valuable data to be thrown away.

Edit: One objection some of the "abort" camp will raise against my answer is that, on systems with overcommit, even calls to malloc that appeared to succeed may fail when the kernel tries to instantiate physical storage for the virtual memory allocated. This ignores the fact that anyone needing high reliability will have overcommit disabled, as well as the fact that (at least on 32-bit systems) allocation failure is more likely due to virtual address space exhaustion than physical storage exhaustion.

score 4 · Answer 2 · answered Jan 29 '12 at 03:21

Simply return an error in whatever way you normally do. Since we're talking about an API, you don't know what environment you're being called from, so just return NULL or follow whatever other error-handling procedure you already use. You don't want to loop forever, since the caller may not really need that memory and they would rather just know that you can't handle it, or perhaps the caller has a user-interface that they can send the error to.

Most APIs will have some sort of return value that indicates an error across all functions, other APIs require that the caller call a special "check_error" function to determine if there is an error. You may also want a "get_error" function to return an error string that the caller can optionally display to the user or include in a log. It should be descriptive: "so-and-so API has encountered an error in function whatever: unable to allocate memory". Or whatever. Enough that when someone gets the error, they know what component threw it, and when he emails you with the log message, you know exactly what went wrong.

Of course you can also just crash, but that prevents the caller from closing out anything else they may have been doing and looks ugly, if your code has a habit of dying when it is called rather than returning an error, people are going to look for a library that doesn't actively try to kill their program.

Brooks Moses · Answer 3 · 2012-01-29T07:19:17.280

The BLAS standard API for linear-algebra functions calls uses a somewhat different approach from the "return an error code" suggestions given here: It calls a specific documented function, and then returns.

Now, the library also provides an implementation of this documented function, which prints a useful error message and (where possible) a stack trace, and then aborts. That's one way of handling things, and it means that the casual user is not going to run into bizarre problems because they forgot to check for an error code.

However, the critical point of this being a specific documented function is that it means that the user can also choose to provide their own implementation of that function, which will override the default one. And that implementation can do many things -- it can set a global error code that the user then checks, or it can do something that attempts to clear up some memory and continue.

That's sort of a heavyweight solution, both for the implementation and the user, but in cases where an obvious error code isn't appropriate, it provides a lot of flexibility.

Edit to add a few more details: The BLAS function is xerbla (or cblas_xerbla, in the C interface) with the expectation that one would override it at link time -- the assumption is static linking. There are also some relevant notes about how this has to be adjusted for dynamic libraries in this header from Apple (see the comments on SetBLASParamErrorProc, near the bottom of the file) -- in the dynamic-linking case, a callback needs to be registered at runtime.

See also the useful notes by "R." in comments below about how this overriding is unfortunately global, which can cause problems if a user uses your library both directly and indirectly through a second library, and both the user and the second library want to override the handler.

That's a really neat idea. Keeping with the data structure example, the caller might already be providing callbacks for e.g. freeing, comparing, hashing, or converting values to strings -- adding to that an error-handling callback makes sense. — Ismail Badawi, Jan 29 '12 at 03:59
This is actually a really bad design because it's stateful (involves global variables/state), unless all your library functions take a "pointer to context" argument. Imagine the case of an application that uses both libfoo, and libbar which indirectly uses libfoo. Your application sets up the hook function for how to deal with allocation failures, and then libbar replaces that with its own hook. Now your application's handler doesn't get called. Not cool. Libraries should never have this type of global state; it's extremely bad design. — R.. GitHub STOP HELPING ICE, Jan 29 '12 at 04:08
Ah, I hadn't thought of it that way; I was thinking of libraries whose functions behave more or less like methods on opaque structs (e.g. `mylib_append(mylib_list* l, void *element)`). In that case the callback would be an attribute of the struct, would would also be acting as the "context" you mention. — Ismail Badawi, Jan 29 '12 at 04:20
Yes, that's a perfectly good way to do it, but of course it's a pain for the caller to set the callback attribute on every object it creates. I think it would be easier to just pass the callback function pointer to every function, or just use return values and leave the caller with the responsibility for error handling. — R.. GitHub STOP HELPING ICE, Jan 29 '12 at 04:48
I wouldn't recommend anyone, ever, to look at BLAS for an API design example. — Fred Foo, Oct 28 '13 at 15:44

score 3 · Answer 4 · answered Jan 29 '12 at 05:11

For a library, you have two choices. With no application cooperation, pretty much all you can do is pass an error back to the application.

With application cooperation, you can do much more. For example, you can offer the application to register a callback that your library calls when malloc returns NULL. You can pass the callback the number of bytes you need and how urgently you need them. (For example, "will totally fail to operate", "will have to abort an operation", and so on.) Then the application author can decide whether to give you the memory or not.

At the application level, you can do much more. For example, you can malloc a bunch of blocks of memory to use as an "emergency pool". If malloc fails, you can free blocks from the pool and begin load shedding, cache reduction, or whatever other choices you have to reduce memory consumption.

But you generally can't do much in a library except inform a cooperating application.

score 2 · Answer 5 · answered Jan 29 '12 at 03:29

It's difficult to design software to cleanly handle out of memory issues and still proceed. Few real apps make a serious attempt to do this. As a library author the only reasonable thing to do is report an error to the caller. (return a failure; throw an exception; depending on the language etc.)

You definitely don't want to loop and block for a general library. @R has a good point, if a failure occurs try to restore state to its original condition.

Handling out-of-memory and out-of-disk-space problems likely requires coordination in all parts of the app. You might want to have preallocated contingency memory. You'll probably loop/retry mallocs like you were intending but with some delays in between with a timeout. It's really beyond the scope of a typical library.

score 2 · Answer 6 · edited May 23 '17 at 11:45

In a language like Java or C#, the unequivocable answer is usually "throw an exception!".

In C, one common approach is to handle the error synchronously (e.g. with a result code, and/or a flag value like "null").

One can also generate an asynchronous signal (much like a Java "exception" ... or a heavy-handed "abort()"). With this approach, you can also allow the user to install a custom "error handler".

Here's an example using setjmp/longjmp:

http://www.di.unipi.it/~nids/docs/longjump_try_trow_catch.html

And here are some interesting ideas:

http://blog.staila.com/?p=114

And here's a good discussion on the pros/cons of using C callbacks for error handling:

Using callback functions for error handling in C

Ed Heal · Answer 7 · 2015-08-14T15:03:47.907

1

You can write software that does not need malloc in the first place - safe critical stuff.

You just ensure at the start of execution that it allocates, defines what it needs and ensure that the algorithms will not exceed those barriers.

Hard, but not impossible.

edited Aug 14 '15 at 15:03

answered Jan 29 '12 at 03:36

Ed Heal

59,252
17
87
127

C API design: what to do when malloc returns NULL?

7 Answers7