I have an embedded board with PowerPC 5200 running Realtime Linux with version 2.6.33.
My application is using one high-resolution timer in Linux for alarms. This timer sometimes didn’t expire. The problem happens very rarely, it may go many months between each time it happens on a system.
The timer is set by function timer_settime
with absolute time.
I have done some interesting observations when the timer didn’t expire:
- Function
timer_gettime
returns remaining time 1ns. - Active timers are checked by displaying file
/proc/timer_list
and thetimer_list
didn’t show this timer in the active timer list.
I have looked into the Linux source and found a possible scenario:
The function timer_gettime
ends up in function common_timer_get
(posix-timers.c
). Function common_timer_get
returns it_value.tv_nsec = 1
if timer is active and remaining time is <= 0
. This means that the timer has counted down and the timer state must be 'enqueued' or 'callback'.
I suppose that it is in state 'callback', that means it is running in function __run_hrtimer
(hrtimer.c
). Function __run_hrtimer
is calling function __remove_hrtimer
that remove the timer from timer active list before it changes timer state from 'enqueued' to 'callback'.
Several functions are called in function __run_hrtimer
between changing timer state to 'callback' and the end of the function where the state 'callback' is cleared. If it is hanging here, the function timer_gettime
may return 1ns while the timer is not on the active list. Here it is calling several functions in Linux kernel and the callback function in the application.
I have checked the callback function in my application. It is signaling a semaphore and setting the timer again on the same thread. I can't see why that should not work.
Is there someone that has seen a similar case?
Is there someone that has an idea of what is going wrong here?