Design:
- A singleton that contains a
'recursive'
mutex resource. - 2 threads use this singleton to update/manage data.
- Singleton is created whichever thread tries to access it first.
- Singleton creation has a global lock to ensure we call
mutex attr init
andmutex init
only once.
Sample code: Both threads have identical flow (just different data) and will call funcX() first
instance() has a global mutex lock() within to ensure only 1 instance of A gets created. It has also has addition (!_instance) check soon after the lock to make sure we do not create the instance again.
class A
{
public:
void funcA();
void funcB();
private:
<members>
<boost::recursive_mutex> m;
};
void funcA()
{
m.lock();
<Do something>
m.unlock();
return;
}
void funcB()
{
m.lock()
<Do something>
m.unlock()
return;
}
void funcX()
{
Singleton::instance().funcA();
return;
}
void funcY()
{
Singleton::instance().funcB();
return;
}
========================================================================
A& Singleton::instance()
{
<Global mutex lock>
if (!_instance)
{
createInstance();
}
<Global mutex unlock>
return _instance;
}
Problem:
Very rarely, the first mutex lock call does not increment the __count(0)
variable. Although the __owner (thread id)
, __nusers (1)
, __lock (2)
attributes are all updated.
Whenever I try to log __kind
attribute, the issue does not happen.
Initial findings:
When the issue happens, both threads are trying to initialize the singleton (also mutex). Because of global lock within singleton creation, only 1 thread proceeds and creates the mutex and initializes it to the recursive
type.
Then the thread that locks the mutex is looking at outdated memory and leads to thinking mutex type is Normal? __kind = 0
. Mutex lock returns a success.
And when the subsequent unlock is called, mutex type is now updated as recursive
and because of pthread
unlock does not have 0 checks, it ends up decrementing the __count
to be INT_MAX
.
else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex)
== PTHREAD_MUTEX_RECURSIVE_NP, 1))
{
/* Recursive mutex. */
if (mutex->__data.__owner != THREAD_GETMEM (THREAD_SELF, tid))
return EPERM;
if (--mutex->__data.__count != 0)
/* We still hold the mutex. */
return 0;
goto normal;
}
Unlock also returns success and the mutex is never released, causing the other thread to be in wait state forever.
What are the possible reasons for this scenario to happen? Can the __kind
be corrupted somehow?