Sorry to rain on the parade, but this can not be done using the above mechanism, regardless of the atomic increment/decrement primitives used.
The instant that release
does the free
, the object becomes invalid [we must assume that another thread does an instantaneous malloc
and repurposes the memory] and no further access to it can be done by any thread.
After the free
neither retain
nor release
may be called for that object. Not even to merely probe the ref_count
value. The simple ref_count
inc/dec [atomic or not], is insufficient to handle/prevent that.
(1) The interthread lock must reside outside the object and it must not be subject to any alloc/free.
(2) Access to the object(s) must be done via some sort of list traversal. That is, there is a list of active objects.
(3) Access to the list is controlled by a mutex. Because of that, the actual inc/dec [probably] does not need to be atomic [but could be for extra safety]
(4) Using the list assures that once an object has been destroyed, no thread will try to access it, because it has been removed from the active objects list and the threads can no longer "see" it.
The retain
and release
must do something like:
int
retain(List *list,Object *object)
{
int match = 0;
lock_list(list);
for (objnow in list) {
if (objnow is object) {
++objnow.ref_count;
match = 1;
break;
}
}
unlock_list(list);
return match;
}
int
release(List *list,Object *object)
{
int match = 0;
lock_list(list);
for (objnow in list) {
if (objnow is object) {
match = 1;
if (--objnow.ref_count == 0) {
unlink_from_list(list,objnow);
free(objnow);
match = -1;
break;
}
}
}
unlock_list(list);
return match;
}
The mutex/lock method above on the list might also be done with RCU
but that's a bit more complicated.
Of course, "list" here needn't be a simple linked list. It could be a B-tree or some other sort of container.
A notion: Actually, when thinking about it, if an object is not attached to some sort of global/interthread list, the ref_count
tends to lose its meaning. Or more importantly, why would there be interthread contention on ref_count
?
If we merely have some "floating" objects that are not on a list [or are on a local per-thread list], why would multiple threads be trying to up/down the ref_count
since it's more likely that a single thread would "own" the object at that point.
Otherwise, rearchitecting the system might be in order to make it more predictable/stable.
UPDATE:
A thread may not bump the reference count unless it already has a reference, since a reference is needed to access the object.
By having a reference, here, I presume that you mean the thread has done a retain
, will do some stuff, and then do a release
.
Thus if the ref count hits zero, no thread is currently accessing the object nor can any thread do so in the future. Thus it's safe to destroy it.
It may be safe to destroy it, but there is no interlock against multiple threads accessing data [non-lock] cells within the object and colliding.
The problem is having a subthread do a free
.
Consider that we have a main thread and it creates an object obj1
and that gets handed off to two threads tA
and tB
, which refer to it internally as objA
and objB
respectively.
The main thread starts obj1
with a refcount of zero.
Consider the following timeline:
tA: retain(objA)
tA: // do stuff ...
tA: release(objA)
The object refcount is now zero and the memory area has been freed. Any further access is invalid. tB
may not access the memory area for obj1
in any way.
Now, we do [if we choose to ignore that]:
tB: retain(objB)
tB: // do stuff ...
tB: release(objB)
tB
's release will see the refcount go to zero and will do the free
. This is a double free
of obj1
But, tB
can't even do the retain
because the memory for obj1
may have been reallocated by another thread: (1) The main thread for an obj2
or (2) another thread tX
that uses the memory for a completely unrelated purpose
In the case of (1), tB
's objB
is now changing obj2
instead of obj1
In the case of (2), objB
is scribbling on tX
's unrelated memory area. Even a momentary inc/dec is disastrous.
So, in the above, there are race conditions, access to already freed memory, double frees, and writing to (e.g.) objtype_x
as if it were objtype_a
.
So, what if we have the main thread initialize with a refcount of one instead of zero?
Now, things work better. Race conditions are eliminated. But, tA
and tB
will never see refcount drop below one, so neither of them will ever do a free
. So, having the individual threads do the free
is a moot point.
The main thread will have to do the free
, which would be safe. But, main has no way to know what state obj1
is in. That is, has it been processed by tA
, tB
, or both?
So, maybe the object needs a done
mask that gets OR'ed [atomically] with 1 << tA
and 1 << tB
and main will look at this to know when it may do the free
Or, the main thread, if it knows that only the two threads tA
and tB
will access the object, it could initialize the refcount to two and the two threads could just do a release
when they are done with the object.
This doesn't work too well if tB
decides that, after doing its own processing, it needs to send the object off to tC
.
With just a refcount, if the given object must be processed by tA
before tB
, there is no way to ensure this.
Architecturally, this whole system might work better if each thread had an input queue/list [that is mutex locked]. The main thread creates an object, queues it to tA
. tA
dequeues it, does work, and enqueues it to tB
. Each thread might do a "Y" fork. That is, tA
looks at the object and decides to send it to tC
, bypassing tB
entirely. Eventually, one of the threads will queue the object back to the main thread (i.e. the free list for spent objects or to post a result to main (e.g. a form of map/reduce)).
Putting an object on a [reusable] free list (vs. doing free
) eases things a bit because we don't have the "rug pull" effect of doing a free
[with immediate malloc
], so we can store some state information in the object that stays around even if the object is "idle".
So, we have the effect of an interthread pipeline system.
One of the virtues of that approach [which I've used successfully in shipping production systems] is that once an object is queued to a thread, the thread "owns" the object and most of the access needn't be atomic.