How to sleep in the Linux kernel space?

Question

I have a kernel thread which is assigned on a specific CPU with FIFO and highest priority. This thread sleeps from time to time but the time interval must be as precise as possible. So with this in mind what would be the most precise way to sleep in the kernel space?

score 34 · Answer 1 · answered Oct 07 '16 at 15:36

Here is a related excerpt from Documentation/timers/timers-howto.txt:

NON-ATOMIC CONTEXT:

You should use the *sleep[_range] family of functions. There are a few more options here, while any of them may work correctly, using the "right" sleep function will help the scheduler, power management, and just make your driver better :)

Backed by busy-wait loop:
udelay(unsigned long usecs)

Backed by hrtimers:
usleep_range(unsigned long min, unsigned long max)

Backed by jiffies / legacy_timers
msleep(unsigned long msecs)
msleep_interruptible(unsigned long msecs)

Unlike the *delay family, the underlying mechanism driving each of these calls varies, thus there are quirks you should be aware of.

SLEEPING FOR "A FEW" USECS ( < ~10us? )

Use udelay

Why not usleep?
On slower systems, (embedded, OR perhaps a speed-stepped PC!) the overhead of setting up the hrtimers for usleep may not be worth it. Such an evaluation will obviously depend on your specific situation, but it is something to be aware of.

SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):

Use usleep_range

Why not msleep for (1ms - 20ms)?
Explained originally here:
http://lkml.org/lkml/2007/8/3/250
msleep(1~20) may not do what the caller intends, and will often sleep longer (~20 ms actual sleep for any value given in the 1~20ms range). In many cases this is not the desired behavior.

Why is there no usleep / What is a good range?
Since usleep_range is built on top of hrtimers, the wakeup will be very precise (ish), thus a simple usleep function would likely introduce a large number of undesired interrupts.

With the introduction of a range, the scheduler is free to coalesce your wakeup with any other wakeup that may have happened for other reasons, or at the worst case, fire an interrupt for your upper bound.

The larger a range you supply, the greater a chance that you will not trigger an interrupt; this should be balanced with what is an acceptable upper bound on delay / performance for your specific code path. Exact tolerances here are very situation specific, thus it is left to the caller to determine a reasonable range.

SLEEPING FOR LARGER MSECS ( 10ms+ )

Use msleep or possibly msleep_interruptible

What's the difference?
msleep sets the current task to TASK_UNINTERRUPTIBLE whereas msleep_interruptible sets the current task to TASK_INTERRUPTIBLE before scheduling the sleep. In short, the difference is whether the sleep can be ended early by a signal. In general, just use msleep unless you know you have a need for the interruptible variant.

I have compared all the above delay methods and measured the execution time by the function do_gettimeoftime keeping the same value for the delay ( 1 ms = 1000 uS). I discovered that the usleep_range method waits for a seemengly random amount of time (absolutely not reliable). Conversely the other methods are very near to the set point. Do you know why the usleep_range is so inaccurate? — Antonio Petricca, Jun 15 '17 at 20:35
What is the exact command you were using? Also, what arch/board are you using? — Sam Protsenko, Jun 15 '17 at 23:09
The platform is "Linux Mint 18.1 x64 / VMWare hosted by Windows 10 PRO x64". The command is "usleep_range(1000, 1000)", but same random sleeps arise with any value for the max range. — Antonio Petricca, Jun 16 '17 at 05:02
Strange... As I understand hrtimer should be fired upon max timeout (in worst case). But frankly, I never used `usleep_range()`. Can you provide [minimal compilable example](http://sscce.org/) (module), which reproduces the issue? Also, it's probably a good idea to make it a new question. — Sam Protsenko, Jun 16 '17 at 20:33
I posted more details in this new thread https://stackoverflow.com/questions/44598602/linux-kernel-space-delay-on-embedded-device-c-h-i-p . Thank you! — Antonio Petricca, Jun 16 '17 at 22:05

score 1 · Answer 2 · answered Oct 05 '16 at 14:42

1

I've used combination of hrtimer and waitqueue to implement periodical task using kernel thread:

create waitqueue and periodic hrtimer
block kernel thread on waitqueue using wait_event()/wait_event_timeout()
in the hrtimer callback call wake_up()/wake_up_all()

Also, just found, you can implement sleep using hrtimer_init_sleeper() and schedule(), see __wait_event_hrtimeout() or do_nanosleep(). But I nether tried that.

answered Oct 05 '16 at 14:42

alexander

2,703
18
16

i was thinking to use schedule_hrtimeout with TASK_UNINTERRUPTIBLE before. What can affect to accuracy of this kind of sleep? – Andreea Tanasa Oct 05 '16 at 19:41
@Andreea Tanasa, I think, that isr handlers and (may be) other threads on specific CPU may affect sleep delay. Threads can be isolated from CPU using `isolcpus=` linux parameter, interrupts using `smp_affinity ` (irqbalance with FOLLOW_ISOLCPUS). – alexander Oct 06 '16 at 11:37
would something like this work? isolcpus set smp_affinity of soft irqs set preempt_disable in my kernel module invoke schedule_hrtimeout to sleep for regular intervals – Andreea Tanasa Oct 06 '16 at 11:58
Well, you can change smp_affinity of some hardware (mmc, net, spi and so on) and keep smp_affinity of hrtimer driver. So you only have hrtimer interrupts on specific CPU. Anyway, I recomment play around with this solution. – alexander Oct 06 '16 at 12:10
can I disable kworker and migration on the cpu that I have booked (from my kernel module)? – Andreea Tanasa Oct 06 '16 at 12:22
I think it is impossible disable that threads from your kernel module. You should comment out this threads creation somewhere in kernel. And I don't know how that affect system. – alexander Oct 06 '16 at 12:57
i use schedule_hrtimeout but it feels that the timer used behind is not handled with the highest priority possible. Can this be changed? – Andreea Tanasa Oct 06 '16 at 18:53
@Andreea Tanasa, how do you check this? can you trigger gpio and see signal using oscilloscope? hrtimer interrupts priority depends on CPU used, I think. hrtimers registered in linux using [clockevents_config_and_register()](http://lxr.free-electrons.com/ident?i=clockevents_config_and_register) – alexander Oct 07 '16 at 11:30
not really. Just record the time with getrawmonotonic. In high load situations the diff between timestamps if far away from the period that I put in scheduler_hrtimeout. On the other hand this function has some calls like init_on stack which makes me think that a "software" timer is created and register for each call. A more permanent solutions sounds better. How can one register a timer in the hrtimer infrastructure that will be handled with highest priority? – Andreea Tanasa Oct 07 '16 at 11:37
@Andreea Tanasa, hrtimer based on hardware timer found in CPU and it's interrupt priority is hardwired to some value. It is possible to change it's priority using interrupt controller, but I don't know how to do it in linux. You take two time samples using getrawmonotonic before and after call to scheduler_hrtimeout() and sometimes you see greater delay than you pass to scheduler_hrtimeout()? – alexander Oct 07 '16 at 11:48
@Andreea Tanasa, or you made loop in the thread. In the loop do work, call scheduler_hrtimeout() and take timestamp using getrawmonotonic()? And you want to start every loop iteration every millisecond. Here could be error, if don't take into account time used to run work. – alexander Oct 07 '16 at 11:50
"You take two time samples using getrawmonotonic before and after call to scheduler_hrtimeout() and sometimes you see greater delay than you pass to scheduler_hrtimeout()?" Yes. this is how it is. But it feels to much for a high resolution timer. All soft irq are moved and I use isolcpus. – Andreea Tanasa Oct 07 '16 at 11:58
@Andreea Tanasa, just interesting. What value do you pass to scheduler_hrtimeout() and what do you get sometimes? If this latencies unacceptable for your task, may be better try [Real-Time Linux](https://rt.wiki.kernel.org/index.php/Main_Page) patches, but I nether tried it. – alexander Oct 07 '16 at 12:06
the kernel tick 100Hz and what I pass to the timeout function is 60hz. the diff of samples is sometimes very 5ms away from 60hz. Somehow this makes sense because 10ms and 16ms are not really multiples of eachother. But still ... – Andreea Tanasa Oct 07 '16 at 12:09
You pass 1/60 = 16.666667 ms? and what delay you get using getrawmonotonic()? Can you check delays using ktime_get_ns()? – alexander Oct 07 '16 at 12:29
this is what I use now. The drift is up to 5ms – Andreea Tanasa Oct 07 '16 at 12:31
5ms - really big drift. Could you create small build able test and post it to the question? I want to see how do you create thread, how do you call scheduler_hrtimeout(), how do you measure timeouts. And I want to run it on ARM linux 4.1. – alexander Oct 07 '16 at 12:46
not possible. Could please post some (pseudo)code of what can be a correct usage. I run that thread when the cores are used 100%. I use different programs to load the cpus like fft, qsort,etc... – Andreea Tanasa Oct 07 '16 at 12:56

How to sleep in the Linux kernel space?

2 Answers2

NON-ATOMIC CONTEXT:

SLEEPING FOR "A FEW" USECS ( < ~10us? )

SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):

SLEEPING FOR LARGER MSECS ( 10ms+ )

Linked