4

I'm trying to implement an unprivileged test case for unbounded priority inversion in the absence of priority inheritance mutexes, using SCHED_IDLE. The test works with SCHED_FIFO and different realtime priorities (deadlocking for non-PI mutex, immediately resolving with PI mutex), but to include this in a testset that will run without realtime privileges, I'd like to use SCHED_IDLE instead, with the "medium" and "high" priority threads both being SCHED_OTHER (in which case it's not really priority "inversion", but the concept should still work - the "medium" one should preclude execution of the "low" one).

Unfortunately, the test fails to differentiate between PI and non-PI mutexes; it makes forward progress either way. Apparently the SCHED_IDLE task is running even when there is another runnable task. CPU affinity has been set to bind them all to the same core so that the low priority task can't migrate to a different core to run. And I'm aware that SCHED_IDLE tasks are supposed to run with elevated privileges while in kernelspace to prevent kernelspace priority inversion, so I've tried ensuring that the "low" thread doesn't enter kernelspace by making it busy-loop in userspace, and strace shows no indication that it's making a syscall during the time it should not be making forward progress.

Does Linux's SCHED_IDLE just allow idle tasks to run when the core is not actually idle? Or is there something else I might be missing?

Here is the test code, slightly adapted so that it can be run either in realtime mode or SCHED_IDLE:

#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <semaphore.h>

sem_t sem;

void *start1(void *p)
{
    pthread_mutex_lock(p);
    sem_post(&sem);
    sem_post(&sem);
    usleep(100000);
    pthread_mutex_unlock(p);
    return 0;
}

void *start2(void *p)
{
    sem_wait(&sem);
    time_t t0 = time(0);
    while (pthread_mutex_trylock(p)) {
        if (time(0)>t0+5) return 0;
    }
    pthread_mutex_unlock(p);
    return 0;
}

void *start3(void *p)
{
    sem_wait(&sem);
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    ts.tv_sec += 5;
    int r;
    if (r=pthread_mutex_timedlock(p, &ts)) {
        printf("failed: %d %s\n", r, strerror(r));
    } else {
        pthread_mutex_unlock(p);
    }
    return 0;
}

int main(int argc, char **argv)
{
    int policy = argc>1 ? SCHED_IDLE : SCHED_FIFO;
    int a = sched_get_priority_min(policy);
    pthread_attr_t attr;
    pthread_t t1,t2,t3;
    struct sched_param param = {0};

    cpu_set_t set = {0};
    CPU_ZERO(&set);
    CPU_SET(0, &set);
    pthread_setaffinity_np(pthread_self(), sizeof set, &set);

    pthread_attr_init(&attr);
    pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
    pthread_attr_setschedpolicy(&attr, policy);

    pthread_mutexattr_t ma;
    pthread_mutexattr_init(&ma);
    pthread_mutexattr_setprotocol(&ma, PTHREAD_PRIO_INHERIT);
    pthread_mutexattr_settype(&ma, PTHREAD_MUTEX_ERRORCHECK);
    pthread_mutex_t mtx;
    pthread_mutex_init(&mtx, &ma);

    sem_init(&sem, 0, 0);

    param.sched_priority = a+1;
    pthread_attr_setschedparam(&attr, &param);
    if (pthread_create(&t2, policy==SCHED_IDLE ? 0 : &attr, start2, &mtx)) return 1;

    param.sched_priority = a+2;
    pthread_attr_setschedparam(&attr, &param);
    if (pthread_create(&t3, policy==SCHED_IDLE ? 0 : &attr, start3, &mtx)) return 1;

    param.sched_priority = a;
    pthread_attr_setschedparam(&attr, &param);
    if (pthread_create(&t1, &attr, start1, &mtx)) return 1;

    pthread_join(t1, 0);
    pthread_join(t2, 0);
    pthread_join(t3, 0);
    return 0;
}
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711

1 Answers1

2

Does Linux's SCHED_IDLE just allow idle tasks to run when the core is not actually idle? Or is there something else I might be missing?

This is correct. SCHED_IDLE gives tasks a very low but non-zero weighting - about 70% less CPU time than a nice 19 task.

caf
  • 233,326
  • 40
  • 323
  • 462
  • Thanks. This makes it pointless not only for my purpose in testing, but pointless for its intended purpose: ensuring that the demoted task **never** take cpu time away from any normal task. – R.. GitHub STOP HELPING ICE Apr 04 '19 at 01:02
  • I believe the idea is that if you want guarantees like "never", you use realtime policies. – caf Apr 04 '19 at 03:05
  • 1
    Run everything on the system as `SCHED_RR`? The problem `SCHED_IDLE` is supposed to solve is that you want a "below everything" priority without having to explicitly make "everything" realtime/above-idle. – R.. GitHub STOP HELPING ICE Apr 04 '19 at 03:40
  • I wonder if that would actually be possible... Have `init` start everything as `SCHED_RR` at prio 10 or something, and let processes that want to be lower priority drop to lower... Presumably dropping to lower can be done without privileges? – R.. GitHub STOP HELPING ICE Apr 04 '19 at 03:41
  • Well I was more considering that if you have such a requirement it tends to be because you have a known set of tasks you never want interrupted. For everything else the fact that you might at worst lose 1 in ~340 cycles to the "idle" task seems like a non-problem. It probably should have been called `SCHED_NICEST` or something ;) – caf Apr 04 '19 at 23:15