Why does Sleep() slow down subsequent code for 40ms?

Question

I originally asked about this at coderanch.com, so if you've tried to assist me there, thanks, and don't feel obliged to repeat the effort. coderanch.com is mostly a Java community, though, and this appears (after some research) to really be a Windows question, so my colleagues there and I thought this might be a more appropriate place to look for help.

I have written a short program that either spins on the Windows performance counter until 33ms have passed, or else calls Sleep(33). The former exhibits no unexpected effects, but the latter appears to (inconsistently) slow subsequent processing for about 40ms (either that, or it has some effect on the values returned from the performance counter for that long). After the spin or Sleep(), the program calls a routine, runInPlace(), that spins for 2ms, counting the number of times it queries the performance counter, and returning that number.

When the initial 33ms delay is done by spinning, the number of iterations of runInPlace() tends to be (on my Windows 10, XPS-8700) about 250,000. It varies, probably due to other system overhead, but it varies smoothing around 250,000.

Now, when the initial delay is done by calling Sleep(), something strange happens. A lot of the calls to runInPlace() return a number near 250,000, but quite a few of them return a number near 50,000. Again, the range varies around 50,000, fairly smoothly. But, it is clearly averaging one or the other, with nearly no returns anywhere between 80,000 and 150,000. If I call runInPlace() 100 times after each delay, instead of just once, it never returns a number of iterations in the smaller range after the 20th call. As runInPlace() runs for 2ms, this means the behavior I'm observing disappears after 40ms. If I have runInPlace() run for 4ms instead of 2ms, it never returns a number of iterations in the smaller range after the 10th call, so, again, the behavior disappears after 40ms (likewise if have runInPlace() run for only 1ms; the behavior disappears after the 40th call).

Here's my code:

#include "stdafx.h"
#include "Windows.h"

int runInPlace(int msDelay)
{
    LARGE_INTEGER t0, t1;
    int n = 0;

    QueryPerformanceCounter(&t0);

    do
    {
            QueryPerformanceCounter(&t1);
            n++;
    } while (t1.QuadPart - t0.QuadPart < msDelay);

    return n;
}

int _tmain(int argc, _TCHAR* argv[])
{
    LARGE_INTEGER t0, t1;
    LARGE_INTEGER frequency;
    int n;

    QueryPerformanceFrequency(&frequency);

    int msDelay = 2 * frequency.QuadPart / 1000;

    int spinDelay = 33 * frequency.QuadPart / 1000;

    for (int i = 0; i < 100; i++)
    {
        if (argc > 1)
            Sleep(33);
        else
        {
            QueryPerformanceCounter(&t0);

            do
            {
                    QueryPerformanceCounter(&t1);
            } while (t1.QuadPart - t0.QuadPart < spinDelay);
        }

        n = runInPlace(msDelay);
        printf("%d \n", n);
    }

    getchar();

    return 0;
}

Here's some output typical of what I get when using Sleep() for the delay:

56116 248936 53659 34311 233488 54921 47904 45765 31454 55633 55870 55607 32363 219810 211400 216358 274039 244635 152282 151779 43057 37442 251658 53813 56237 259858 252275 251099

And here's some output typical of what I get when I spin to create the delay:

276461 280869 276215 280850 188066 280666 281139 280904 277886 279250 244671 240599 279697 280844 159246 271938 263632 260892 238902 255570 265652 274005 273604 150640 279153 281146 280845 248277

Can anyone help me understand this behavior? (Note, I have tried this program, compiled with Visual C++ 2010 Express, on five computers. It only shows this behavior on the two fastest machines I have.)

Sleep only guarantees a minimum time. After the time expires it will wait for the next available time slice before resuming. — Richard Critten, Feb 28 '16 at 00:13
Sleep() does what it says, it literally puts the processor to sleep. It does absolutely nothing, stopped by the HLT instruction. It can only be woken up by an interrupt. Those interrupts are periodic, by default they fire 64 times per second. So actual sleep time is 15.625 or 31.250 or 46.875 etc msec. That can be messed with, start Chrome for example. Underlying call it uses is timeBeginPeriod(). More well-behaved browsers change the rate to 10 msec. So you get 10 or 20 or 30 or 40, etc. Getting 33 msec requires changing the period to 1 msec. — Hans Passant, Feb 28 '16 at 00:13
That's true, but it wouldn't explain why code that executes after Sleep() returns runs slowly for a while. — Stevens Miller, Feb 28 '16 at 00:13
You need to read the documentation for [Sleep](https://msdn.microsoft.com/en-us/library/windows/desktop/ms686298.aspx) again. You are basing your reasoning off of rules that do not coincide with reality. Also relevant: [Windows Timer Coalescing](http://go.microsoft.com/fwlink/p/?linkid=246618). @RichardCritten: That's not what the documentation says. A `Sleep` does not guarantee the minimum you are talking about. — IInspectable, Feb 28 '16 at 00:15
@IInspectable 20 , I understand that, but why would anything that runs after Sleep() returns run any differently than if it were preceded by a spin instead of Sleep()? I'm not timing the delay before runInPlace() is called. I'm timing how fast runInPlace runs after the delay. How could Sleep() affect anything that runs after Sleep() returns, regardless of how long Sleep slept? — Stevens Miller, Feb 28 '16 at 00:17
@IInspectable "After the sleep interval has passed, the thread is ready to run. If you specify 0 milliseconds, the thread will relinquish the remainder of its time slice but remain ready. Note that a ready thread is not guaranteed to run immediately. Consequently, the thread may not run until some time after the sleep interval elapses.". Source: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686298(v=vs.85).aspx — Richard Critten, Feb 28 '16 at 00:19
@RichardCritten: You also need to read the Remarks section: *"If dwMilliseconds is less than the resolution of the system clock, the thread may sleep for less than the specified length of time. [...]"* — IInspectable, Feb 28 '16 at 00:21
I suspect there is a cache-locality improvement in the case where you have already called `QueryPerformanceCounter()` a number of times. — user207421, Feb 28 '16 at 00:31
Thanks, @EJP, you actually addressed my question. Caching issues crossed my mind too, but I wouldn't expect that slow-down effect to appear after runInPlace() had ever run at full speed in that case, and, when I run it more than once after the inter-call delay, it sometimes does. This is really one of the strangest behaviors I've seen in code in quite a while. — Stevens Miller, Feb 28 '16 at 02:12

score 9 · Accepted Answer · answered Feb 28 '16 at 02:02

9

This sounds like it is due to the reduced clock speed that the CPU will run at when the computer is not busy (SpeedStep). When the computer is idle (like in a sleep) the clock speed will drop to reduce power consumption. On newer CPUs this can be 35% or less of the listed clock speed. Once the computer gets busy again there is a small delay before the CPU will speed up again.

You can turn off this feature (either in the BIOS or by changing the "Minimum processor state" setting under "Processor power management" in the advanced settings of your power plan to 100%.

answered Feb 28 '16 at 02:02

1201ProgramAlarm

32,384
7
42
56

You nailed it @1201ProgramAlarm ! My minimum processor speed was set to 5%. I set it to 100% and the behavior has consistently vanished. Restoring the setting to 5% instantly returned the behavior. I honestly never would have thought of that, as I would have expected the speed to change much more quickly on its own, and I would also assume so much else was going on in my computer that a meaningful drop wasn't going to happen. Amazing. Thanks for the answer and for actually reading my question. – Stevens Miller Feb 28 '16 at 02:47
1

@StevensMiller: Modern CPUs are fast enough to not be useless at their most efficient speed. OSes don't jump to max speed until they've seen a process use its entire timeslice a few times, indicating that is has more work to do than it can keep up with at this clock speed. Otherwise it is actually best to leave the CPU at a low clock speed, because it's "fast enough" for whatever the CPU is doing. Intel Skylake moves the decision making into the CPU, so it can respond much more quickly to load and to non-load (microseconds instead of milliseconds). – Peter Cordes Feb 29 '16 at 07:09
You only need to make your computer less efficient if this behaviour is actually causing a problem (e.g. realtime guarantees not being met in responding to the first request after a gap). Even so, you could make the minimum speed 80% or something. Speed increases also require increased voltage, so power increases more than linearly with frequency. The total energy to do a given computation increases if you do it faster. (race-to-sleep works well, but running at a more efficient speed is better until you get down to the minimum voltage for correct operation at any clock speed). – Peter Cordes Feb 29 '16 at 07:14

Matteo Italia · Answer 2 · 2016-02-28T02:16:13.677

5

Besides what @1201ProgramAlarm said (which may very well be, modern processors are extremely fond of downclocking whenever they can), it may also be a cache warming up problem.

When you ask to sleep for a while the scheduler typically schedules another thread/process for the next CPU time quantum, which means that the caches (instruction cache, data cache, TLB, branch predictor data, ...) relative to your process are going to be "cold" again when your code regains the CPU.

edited Feb 28 '16 at 02:16

answered Feb 28 '16 at 02:11

Matteo Italia

123,740
17
206
299

1

Thanks, @Matteo Italia. Looks like 1201ProgramAlarm got it right, but your thoughts are also relevant and I'll have them in mind for anything similar. I'm grateful to you both for reading my question carefully enough to know what I was really asking. As some of the early comments show, almost any question involving Sleep provokes a discussion about granularity and accuracy, concepts that aren't relevant here. Cheers, fellows! – Stevens Miller Feb 28 '16 at 02:48

Why does Sleep() slow down subsequent code for 40ms?

2 Answers2

Linked