wait_event_interruptible_timeout always expires, even though wake_up event occurs in time

Question

i have a question about what seems to be weird behavior of wait_event and wake_up on an Android embedded platform (Exynos5dual based) with a pre-emptive linux 3.0 kernel. It does not happen on a normal SMP laptop with a non-preemptive kernel (any version)

We have a linux device driver with a classic sleeper/waker scenario and here's what happens:

T0: taskA:
    if(!flag)
        wait_event_interruptible_timeout(wq, flag==true, timeout=0.5sec)

T1: (after a few msec) taskB: 
    atomic set flag
    wake_up_interruptible()

T2: (after timeout msec) taskA: 
    wait_event_interruptible_timeout expires (ret 0) instead of waking up at T1

All read and writes of flag are atomic, and have gone using from atomic bitops (kernel set/test bit), to volatile atomic_t, to using memory barriers for each read/write with atomic_t vars (according to this)

if TaskA actually starts waiting (wait_event_* kernel functions first check the condition so it may not always be the case), then it waits for the full timeout instead of getting woken up by taskB when the flag changes value and wake_up() is called.

We suspect that the two tasks occur on different cores. Core1 deep-sleeps after wait_event_..() and cannot be woken up by wake_up_interruptible() which occurs on Core2.

Does anyone know if this is true, or if something else is to blame?

NOTE: The issue seems to go away if we save the sleeper's task struct ptr and and then do wake_up_process(saved_ptr) before (and in addition to) wake_up_interruptible(). We find this less than optimal and wonder if there is a better way.

I don't know anything about android, so I'm probably on a wrong track, but did you make sure that the wait_event in task A really occurs before the wake_up is called ? Besides, my intuition is that the atomic flag is not enough to protect your code against thread issues. It's the whole block flag test/set + wake_up calls that shall be made atomic. — Alexandre Vinçon, Feb 22 '13 at 07:06
the race between the two events (start of wait and wakeup) is a valid point. I'll have to get back to that with more testing. However why would calling wake_up_process() before wake_up() mitigate this? Purely because of the change of timing? — cygnus, Feb 22 '13 at 12:55
You might find this link useful: [Sleeping in the Kernel](http://www.linuxjournal.com/article/8144). Check paragraph called "Lost Wake-Up Problem". It also looks like you're missing a parameter for wake_up_interruptible(). — Alexandre Vinçon, Feb 22 '13 at 20:09
thanks for the link, it seems to address the issue at hand (in all my googling i never entered "lost wakeup" and "missed" or "timer expires despite wakeup event" didn't help much). As for wake_up_interruptible parameters, this is pseudo-code, i can assure you the real code at least compiles..:) — cygnus, Feb 23 '13 at 11:42
the article overcomes the problem by using lower level primitives than wake_events. It suggests setting current_state to interruptible before the protective spinlock, yet the linux wait_event primitives don't do that (look in prepare_to_wait() inside the __wait_event_* macros). Yet in the next page, the article advocates the use of these higher level primitives, saying that they handle the lost wakeup problem. In any case it's a good idea to test replacing the linux wait_event_*() functions with lower level primitives. — cygnus, Feb 23 '13 at 17:27

wait_event_interruptible_timeout always expires, even though wake_up event occurs in time

0 Answers0