Atomic decrement-and-test in C

Question

I'm implementing a reference-counting system in C that needs to work with multiple threads. As a result, I need a way to decrement an integral reference count and test if the result is zero with one atomic operation. I can use C11 and stdatomic.h, but there doesn't seem to be a decrement-and-test operation.

What's the best (i.e. most portable) way of going about this? Can I use the stdatomic.h functions to achieve this?

This is the core of the reference counting (pseudocode):

retain(object) {
    ++object.ref_count;  // pretty easy to make this atomic
} 

release(object) {
    if (--object.ref_count == 0)  // need this to be atomic also
        free(object)
}

What do you mena with "test"? Whatis the problem with `fetch_add`/`_sub`? — too honest for this site, Jun 18 '16 at 17:10
@Olaf Decrementing atomically is fairly easy, but the value could change after the decrement and before the comparison to zero. — user6149363, Jun 18 '16 at 17:11
In general, you can't implement a test and decrement using fetch_sub, but for a reference counting system the value should increment after the counter has been decremented to zero. — user1937198, Jun 18 '16 at 17:12
ZF is there for you but then? You jump and again it might be changed in the middle — Adriano Repetti, Jun 18 '16 at 17:14
I smell an XY problem. Please state **in the question** what you actually want to achive, not how. — too honest for this site, Jun 18 '16 at 17:15
@Olaf I've explained exactly what I'm trying to do: implement a reference counting system that works with multiple threads. Hopefully the latest edit makes this clearer. — user6149363, Jun 18 '16 at 17:16
@user6149363: You realize that the `atomic_fetch_op()` functions return the old value of the object? — EOF, Jun 18 '16 at 17:17
I doubt this `if (--object.ref_count == 0)` necessarily gets compiled into an atomic operation. — alk, Jun 18 '16 at 17:17
@alk That's precisely my question: I'd like this to be atomic. — user6149363, Jun 18 '16 at 17:19
I'm not 100% sure of the C syntax, but wouldn't making the if expression `atomic_fetch_sub(&object.ref_count, 1) == 1` make thread safe for this problem? — user1937198, Jun 18 '16 at 17:20
But also: From which fact do you deduct this `++object.ref_count;` would be atomic? — alk, Jun 18 '16 at 17:32
@user1937198: Why? Fetch&Sub is atomic, but before the comparison is done another thread could have modified `refcount` already again. — alk, Jun 18 '16 at 17:33
@alk That was just my pseudocode; for that I could actually just use one of the `stdatomic.h` functions. I'll edit to clarify. — user6149363, Jun 18 '16 at 17:34
The Linux kernel knows an `atomic_dec_and_test`. This for x86 however is implemented as a macro which goes down to assembler. :-/ (arch/x86/include/asm/atomic.h and arch/x86/include/asm/rmwcc.h) — alk, Jun 18 '16 at 17:40
@alk Because if the returned value is 1 there can't be any other threads executing the comparison as there are no other references. Either that is true, or there is no way that this can be guaranteed to be safe as you can have a history where objects are deleted and then a reference to them is generated. — user1937198, Jun 18 '16 at 17:51
If you can have multiple threads accessing the same reference, the way to fix that is to use an atomic load + compare exchange loop in retain to ensure you don't create references from nowhere. — user1937198, Jun 18 '16 at 17:53
Not convinced this is possible in standard c. Consider an inline assembly solution that uses a bus lock. That ought not to incur any more overhead than a pipeline dump. — Bathsheba, Jun 18 '16 at 18:08
What is wrong with mutexes, available in any multithreading library? — mouviciel, Jun 18 '16 at 18:11
Using a mutex will be an order of magnitude slower than native atomics or online assembly. — Bathsheba, Jun 18 '16 at 18:38
Using a mutex for protecting a counter is a complete waste. Down below such a mutex isn't much else than an atomic counter that additionally is accessed with some system calls. So you would be using an atomic counter to protect a counter that you want to be atomic and in addition waste execution time in system calls. Please see my answer for how this is supposed to work with C11 threads and atomics. — Jens Gustedt, Jun 18 '16 at 18:54
it would be massively easier to wrap all references to the `object.ref_count` variable with a single mutex. Then there would be no problem. even if the operation is in intrinsically atomic. Never a need to make code more complex that necessary. Amongst other things, the kind of operation your proposing will almost always lead to a 'race' condition. Avoid that problem, use a mutex. — user3629249, Jun 19 '16 at 01:09

score 5 · Accepted Answer · answered Jun 18 '16 at 18:49

5

You seem to have a misconsception of C11's atomics. Atomic qualifies a type, not a single operation.

If you declare your variable with _Atomic all operations on it are atomic. So if you are satisfied with the default "sequential consistency" of the atomic operations (which you should), an additional _Atomic qualification is all you need. And the prefix -- operator should work fine for what you need.

If you want do deal with different types of consistency you could use atomic_fetch_sub, e.g. Only that then you obtain the value before the modification and not the one after. So instead of comparing to 0 you should then compare it to 1.

answered Jun 18 '16 at 18:49

Jens Gustedt

76,821
6
102
177

Yup, this looks feasible, plus one – Bathsheba Jun 18 '16 at 18:55
Thanks for your answer. What counts as an "operation"? Would something like `--object.ref_count == 0` be a single operation? – user6149363 Jun 18 '16 at 20:48
No, but you don't care when the comparison takes place. – David Schwartz Jun 18 '16 at 21:04
1

@DavidSchwartz I was originally thinking that this could lead to multiple `free`s on the same object, but I guess you're right since the semantics of reference counting would prevent this. – user6149363 Jun 18 '16 at 21:07
1

@user6149363, the `--` would be a single atomic operation on the `object.ref_count` object. That's what is important. If you have your refence counting algorithm correct otherwise, this can only happen exactly once. The thread that sees this `0` is the one to free the object. – Jens Gustedt Jun 18 '16 at 21:11
@JensGustedt But what if thread A sees the 0, then thread B increments the counter to 1, then thread A (having seen the 0) _now_ frees the object even though the counter is _now_ 1? – Paul J. Lucas Nov 17 '20 at 21:51

Craig Estey · Answer 2 · 2016-06-18T23:08:07.560

Sorry to rain on the parade, but this can not be done using the above mechanism, regardless of the atomic increment/decrement primitives used.

The instant that release does the free, the object becomes invalid [we must assume that another thread does an instantaneous malloc and repurposes the memory] and no further access to it can be done by any thread.

After the free neither retain nor release may be called for that object. Not even to merely probe the ref_count value. The simple ref_count inc/dec [atomic or not], is insufficient to handle/prevent that.

(1) The interthread lock must reside outside the object and it must not be subject to any alloc/free.

(2) Access to the object(s) must be done via some sort of list traversal. That is, there is a list of active objects.

(3) Access to the list is controlled by a mutex. Because of that, the actual inc/dec [probably] does not need to be atomic [but could be for extra safety]

(4) Using the list assures that once an object has been destroyed, no thread will try to access it, because it has been removed from the active objects list and the threads can no longer "see" it.

The retain and release must do something like:

int
retain(List *list,Object *object)
{
    int match = 0;

    lock_list(list);

    for (objnow in list) {
        if (objnow is object) {
            ++objnow.ref_count;
            match = 1;
            break;
        }
    }

    unlock_list(list);

    return match;
}

int
release(List *list,Object *object)
{
    int match = 0;

    lock_list(list);

    for (objnow in list) {
        if (objnow is object) {
            match = 1;
            if (--objnow.ref_count == 0) {
                unlink_from_list(list,objnow);
                free(objnow);
                match = -1;
                break;
            }
        }
    }

    unlock_list(list);

    return match;
}

The mutex/lock method above on the list might also be done with RCU but that's a bit more complicated.

Of course, "list" here needn't be a simple linked list. It could be a B-tree or some other sort of container.

A notion: Actually, when thinking about it, if an object is not attached to some sort of global/interthread list, the ref_count tends to lose its meaning. Or more importantly, why would there be interthread contention on ref_count?

If we merely have some "floating" objects that are not on a list [or are on a local per-thread list], why would multiple threads be trying to up/down the ref_count since it's more likely that a single thread would "own" the object at that point.

Otherwise, rearchitecting the system might be in order to make it more predictable/stable.

UPDATE:

A thread may not bump the reference count unless it already has a reference, since a reference is needed to access the object.

By having a reference, here, I presume that you mean the thread has done a retain, will do some stuff, and then do a release.

Thus if the ref count hits zero, no thread is currently accessing the object nor can any thread do so in the future. Thus it's safe to destroy it.

It may be safe to destroy it, but there is no interlock against multiple threads accessing data [non-lock] cells within the object and colliding.

The problem is having a subthread do a free.

Consider that we have a main thread and it creates an object obj1 and that gets handed off to two threads tA and tB, which refer to it internally as objA and objB respectively.

The main thread starts obj1 with a refcount of zero.

Consider the following timeline:

tA: retain(objA)
tA: // do stuff ...
tA: release(objA)

The object refcount is now zero and the memory area has been freed. Any further access is invalid. tB may not access the memory area for obj1 in any way.

Now, we do [if we choose to ignore that]:

tB: retain(objB)
tB: // do stuff ...
tB: release(objB)

tB's release will see the refcount go to zero and will do the free. This is a double free of obj1

But, tB can't even do the retain because the memory for obj1 may have been reallocated by another thread: (1) The main thread for an obj2 or (2) another thread tX that uses the memory for a completely unrelated purpose

In the case of (1), tB's objB is now changing obj2 instead of obj1

In the case of (2), objB is scribbling on tX's unrelated memory area. Even a momentary inc/dec is disastrous.

So, in the above, there are race conditions, access to already freed memory, double frees, and writing to (e.g.) objtype_x as if it were objtype_a.

So, what if we have the main thread initialize with a refcount of one instead of zero?

Now, things work better. Race conditions are eliminated. But, tA and tB will never see refcount drop below one, so neither of them will ever do a free. So, having the individual threads do the free is a moot point.

The main thread will have to do the free, which would be safe. But, main has no way to know what state obj1 is in. That is, has it been processed by tA, tB, or both?

So, maybe the object needs a done mask that gets OR'ed [atomically] with 1 << tA and 1 << tB and main will look at this to know when it may do the free

Or, the main thread, if it knows that only the two threads tA and tB will access the object, it could initialize the refcount to two and the two threads could just do a release when they are done with the object.

This doesn't work too well if tB decides that, after doing its own processing, it needs to send the object off to tC.

With just a refcount, if the given object must be processed by tA before tB, there is no way to ensure this.

Architecturally, this whole system might work better if each thread had an input queue/list [that is mutex locked]. The main thread creates an object, queues it to tA. tA dequeues it, does work, and enqueues it to tB. Each thread might do a "Y" fork. That is, tA looks at the object and decides to send it to tC, bypassing tB entirely. Eventually, one of the threads will queue the object back to the main thread (i.e. the free list for spent objects or to post a result to main (e.g. a form of map/reduce)).

Putting an object on a [reusable] free list (vs. doing free) eases things a bit because we don't have the "rug pull" effect of doing a free [with immediate malloc], so we can store some state information in the object that stays around even if the object is "idle".

So, we have the effect of an interthread pipeline system.

One of the virtues of that approach [which I've used successfully in shipping production systems] is that once an object is queued to a thread, the thread "owns" the object and most of the access needn't be atomic.

A thread may not bump the reference count unless it already has a reference, since a reference is needed to access the object. Thus if the ref count hits zero, no thread is currently accessing the object nor can any thread do so in the future. Thus it's safe to destroy it. — David Schwartz, Jun 18 '16 at 21:05
@DavidSchwartz Hi David. Somehow I [just] knew you'd be weighing in on this sort of question ;-) Anyway, I've done an update. Please have a look. — Craig Estey, Jun 18 '16 at 23:12
You say the object was "handed off to two threads", but that's not how reference counted object work. You don't hand off the object, you hand off a reference count. To hand off reference counts to two threads, you must *have* two reference counts to hand off. The "retain" function gets you an additional reference count to hand off -- you must already have one yourself or you can't call "retain" because without a reference, you cannot know the object still exists. — David Schwartz, Jun 18 '16 at 23:14

Atomic decrement-and-test in C

2 Answers2