Priority inversion is a common and somewhat old problem. Those who dealt with OS process scheduling, especially if there are real-time requirements, are familiar with it. There are few well-known solutions to the problem, each having its pros and cons:
- Disabling all interrupts to protect critical sections
- A priority ceiling
- Priority inheritance
- Random boosting
It doesn't matter which method is chosen to cope with priority inversion; all of those are relatively easy to implement in the OS kernel given that applications use well-defined interface for synchronizing shared resources. For instance, if a process locks a mutex using, for example, pthread_mutex_lock
, the OS is well aware of that fact because deep down this function does a system call (i.e. futex
on Linux) . When the kernel serves this request, it has a complete and clear picture of who is waiting on what, and can decide how to handle priority inversion best.
Now, imagine that kernel doesn't know when process is locking/unlocking a mutex. This could happen, for instance, if atomic CPU instruction is used to implement a mutex (as in “lock-free” algorithms). Then it becomes possible for a low-priority process to grab a lock and get suspended from executing because of a higher-priority task. Then, when a higher priority task is scheduled, it would simply burn the CPU trying to lock a “spin-lock”. A deadlock like that would render the whole system useless.
Given the scenario above and the fact that we cannot change the program to not use atomic operations to synchronize access to shared resources, the problem boils down to detecting when code is trying to do so.
I had a few somewhat vague heuristic ideas that are both hard to implement and could give false positives. Here they are:
- Look at the program counter register once in a while and try to detect that code simply burns a CPU in a tight loop. If the code is spotted in that place N times, suspend the process and let other lower-priority processes chance to run and unlock the mutex. This methods is way too far from ideal and can give way too much false positives.
- Have a hard-limit to how much time a process can run. This immediately drops hard real-time capabilities of the scheduler, but it could work. The problem is, however, that in "deadlock" cases, the high-priority process would waste all its time window trying to acquire a busy resource.
- I don't know if this is even possible, but another idea is to intercept/interpose atomic CPU instructions to have scheduler be aware of locking/unlocking attempts. In other words, essentially turning atomic CPU operations into some sort of system calls. Somewhat close in its mechanics to how virtual page mapping is created when MMU signals a page fault.
What do you think of the above ideas? What other ways of detecting such a code could you possibly think of?