1

I am trying to implement a user level thread library and need to schedule threads in a round robin fashion. I am currently trying to make switching work for 2 threads that I have created using makecontext, getcontext and swapcontext. setitimer with ITIMER_PROF value is used and sigaction is assigned a handler to schedule a new thread whenever the SIGPROF signal is generated. However, the signal handler is not invoked and the threads therefore never get scheduled. What could be the reason? Here are some snippets of the code:

    void userthread_init(long period){
    /*long time_period = period;
//Includes all the code like initializing the timer and attaching the signal
// handler function "schedule()" to the signal SIGPROF. 
// create a linked list of threads - each thread's context gets added to the list/updated in the list
// in userthread_create*/

    struct itimerval it;
    struct sigaction act;
    act.sa_flags = SA_SIGINFO;
    act.sa_sigaction = &schedule;
    sigemptyset(&act.sa_mask);
    sigaction(SIGPROF,&act,NULL);
    time_period = period;
    it.it_interval.tv_sec = 4;
    it.it_interval.tv_usec = period;
    it.it_value.tv_sec = 1;
    it.it_value.tv_usec = 100000;
    setitimer(ITIMER_PROF, &it,NULL);
    //for(;;);
}

The above code is to initialize a timer and attach a handler schedule to the signal handler. I am assuming the signal SIGPROF will be given to the above function which will invoke the scheduler() function. The scheduler function is given below:

void schedule(int sig, siginfo_t *siginf, ucontext_t* context1){
    printf("\nIn schedule");
    ucontext_t *ucp = NULL;
    ucp = malloc(sizeof(ucontext_t));
    getcontext(ucp);
    //ucp = &sched->context;
    sched->context = *context1;
    if(sched->next != NULL){

        sched = sched->next;
    }
    else{
        sched = first;
    }
    setcontext(&sched->context);
}   

I have a queue of ready threads in which their respective contexts are stored. Each thread should get scheduled whenever setcontext instruction is executed. However, scheduler() is not invoked! Can anyone please point out my mistake??

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259

1 Answers1

2

Completely revising this answer after looking at the code. There are a few issues:

  • There are several compiler warnings
  • You are never initializing your thread ID's, not outside or inside your thread creation method, so I'm surprised the code even works!
  • You are reading from uninitialized memory in your gtthread_create() function, I tested on both OSX & Linux, on OSX it crashes, on Linux by some miracle it's initialized.
  • In some places you call malloc(), and overwrite it with a pointer to something else - leaking memory
  • Your threads don't remove themselves from the linked list after they've finished, so weird things are happening after the routines finish.

When I add in the while(1) loop, I do see schedule() being called and output from thread 2, but thread 1 vanishes into fat air (probably because of the uninitialized thread ID). I think you need to have a huge code cleanup.

Here's what I'd suggest:

  • Fix ALL of your compiler warnings — even if you think they don't matter, the noise may lead to you missing things (such as incompatible pointer types, etc). You're compiling with -Wall & -pedantic; that's a good thing - so now take the next step & fix them.
  • Put \n at the END of your printf statements, not the start — The two threads ARE outputting to stdout, but it's not getting flushed so you can't see it. Change your printf("\nMessage"); calls to printf("Message\n");
  • Use Valgrind to detect memory issues — valgrind is the single most amazing tool you will ever use for C/C++ development. It's available through apt-get & yum. Instead of running ./test1, run valgrind ./test1 and it will highlight memory corruption, memory leaks, uninitialized reads, etc. I can't stress this enough; Valgrind is amazing.
  • If a system call returns a value, check it — in your code, check the return values to all of getcontext, swapcontext, sigaction, setitimer
  • Only call async-signal-safe methods from your scheduler (or any signal handler) — so far you've fixed malloc() and printf() from inside your scheduler. Check out the signal(7) man page - see "Async-signal-safe functions"
  • Modularize your code — your linked list implementation could be tidier, and if it was separated out, then 1) your scheduler would have less code & be simpler, and 2) you can isolate issues in your linked list without having to debug scheduler code.

You're almost there, so keep at it - but keep in mind these three simple rules:

  1. Clean as you go
  2. Keep the compiler warnings fixed
  3. When weird things are happening, use valgrind

Good luck!


Old answer:

You should check the return value of any system call. Whether or not it helps you find the answer, you should do it anyway :)

Check the return value of sigaction(), if it's -1, check errno. sigaction() can fail for a few reasons. If your signal handler isn't getting fired, it's possible it hasn't been set up.

Edit: and make sure you check the return of setitimer() too!

Edit 2: Just a thought, can you try getting rid of the malloc()? malloc is not signal safe. eg: like this:

void schedule(int sig, siginfo_t *siginf, ucontext_t* context1){
    printf("In schedule\n");
    getcontext(&sched->context);
    if(sched->next != NULL){
        sched = sched->next;
    }
    else{
        sched = first;
    }
    setcontext(&sched->context);
}   

Edit 3: According to this discussion, you can't use printf() inside a signal handler. You can try replacing it with a call to write(), which is async-signal safe:

// printf("In schedule\n");
const char message[] = "In schedule\n";
write( 1, message, sizeof( message ) );
nevelis
  • 736
  • 6
  • 17
  • I just checked the return values of both setitimer() and sigaction(). They both return 0. "If your signal handler isn't getting fired, it's possible it hasn't been set up". I have defined the signal handler so that its according to the format expected by sigaction. I am not sure what else I have to look into :( – user2984490 Feb 03 '14 at 06:01
  • Bugger :( Perhaps try `kill -SIGPROF` from the command line to send a signal? Try ruling out the timer as a possibility. – nevelis Feb 03 '14 at 06:15
  • Can you tell me how I can send a signal from the command line? I need to mention the pid of the process, correct? I am unsure how to obtain the pid and send a signal to that process from command line. – user2984490 Feb 03 '14 at 06:32
  • If your binary is called 'test', then do `pkill -SIGPROF test`. – nevelis Feb 03 '14 at 06:40
  • Hmm, just a thought... Can you put the \n in the printf statement after? So that it reads `printf("In schedule\n");`? stdout is buffered and will only flush on a newline (unless you call `fflush(stdout)` explicitly), so if it's hung in schedule(), you may not see it. – nevelis Feb 03 '14 at 06:42
  • I tried inserting newline character at the end. Still did not change things! What should I see on the command prompt when I give the pkill command? I do not see any output on the command prompt when I do that. – user2984490 Feb 03 '14 at 06:51
  • Check out the edit with removing the malloc() call in the signal handler. – nevelis Feb 03 '14 at 06:55
  • Tried that as well :( the edit without the malloc() call I mean. I am starting to think the structure 'act' is not populated right. Could that be the reason? – user2984490 Feb 03 '14 at 07:02
  • It could be the call to printf(), try using write() as above. I think we should move this to chat ;) – nevelis Feb 03 '14 at 07:34
  • I am a newbie so I am not sure if I can move this to chat :( I used write() as mentioned as well. Scheduler() is still not invoked. Is the signal SIGPROF getting blocked implicity? – user2984490 Feb 03 '14 at 08:10
  • Unfortunately you don't have enough rep for chat yet! If you can flick me a compiling version of your code to nevelis at gmail, I can take a look & see if I can figure out what the problem is. – nevelis Feb 03 '14 at 10:33