This problem known as Convoying: all other threads have to wait if a thread holding a lock is descheduled due to a time-slice interrupt or page fault. https://en.wikipedia.org/wiki/Lock_(computer_science)#Disadvantages
As known if thread-1 locks the std::mutex
and occurs switch of thread-1, and if at this time many threads (2, 3, 4 ...) want to lock this mutex, then all these threads will locked and will be waiting for switch-on thread-1.
The solution to this is to use lock-free algorithms. But if requre to use a mutex, that is some solution to avoid such a situation?
How can I find out in advance for 100 cycles before the imminent switching of thread?
Or how can I raise an exception in advance for 100 cycles before switching the flow on Linux x86_64?
Or how can I to make continue work of thread for some time (100 cycles)?
UPDATE:
I have 20 CPU Cores, and my program have 40 threads divided by 2 parts:
- Part-1 - 20 threads use 1-st shared resource protected by
std::mutex mtx1
- Part-2 - 20 threads use 2-nd shared resource protected by
std::mutex mtx2
It is known that the operating system gives each thread a certain quantum of time to work after which lulls him, and gives the vacant core of the next thread that will run the same time slot.
Part-1: Sometimes, not often, but this case is critical for me, happen that 1 of 20 threads do mtx1.lock()
then start work with shared resource and then OS switch-off (puts to sleep) this thread before done mtx1.unlock()
- because expired quant of time which allocated by OS to this thread and operating system decided to makes sleep this thread. And OS switch-on this thread only after ~1 - 10ms (30 000 000 cycles). During this time 19 other threads of Part-1 at least once try to get a shared resource each of 10 usec ( 30 000 cycles), but mtx1
is busy.
Then each of 19 threads of Part-1 begins to fall asleep, and vacated CPU-cores are occupied by threads from Part-2. OS see that all cores are busy and don't wake thread of Part-1.
This case occurs not often, but when this occurs then Part-1 (20 threads) freezes a whole 1-10 milliseconds (30 000 000 cycles), which is very unacceptably for the task.
How do that never been a situation with a delay of Part-1 more than 10 microseconds (30 000 cycles)?