Why strange timing results appear in spite of binding a thread to a specific CPU-core?

Question

I'm doing some experiments with low-latency programming. I want to eliminate context switching, and be able to reliably measure latency without affecting performance too much.

To begin with, I wrote a program that requests time in loop 1M times and than prints statistics (code below), since I wanted to know how much time the call to the timer is taking. Surprisingly, the output is the following (in microseconds):

Mean: 0.59, Min: 0.49, Max: 25.77
Mean: 0.59, Min: 0.49, Max: 11.73
Mean: 0.59, Min: 0.42, Max: 14.11
Mean: 0.59, Min: 0.42, Max: 13.34
Mean: 0.59, Min: 0.49, Max: 11.45
Mean: 0.59, Min: 0.42, Max: 14.25
Mean: 0.59, Min: 0.49, Max: 11.80
Mean: 0.59, Min: 0.42, Max: 12.08
Mean: 0.59, Min: 0.49, Max: 21.02
Mean: 0.59, Min: 0.42, Max: 12.15

As you can see, although the average time is less than one microsecond, there are spikes of up to 20 microseconds. That's despite the fact that the code runs on a dedicated core (affinity set to a specific core, while the affinity of the init process is set to a group of other cores), and that hyper-threading is disabled on the machine. Tried it with multiple kernel versions, including preemptive and RT, and the results are essentially the same.

Can you explain the huge difference between mean and max?
Is the problem in the calls to the timer, or with a process isolation?

I also tried this with calls to other timers -
- CLOCK_THREAD_CPUTIME_ID,
- CLOCK_MONOTONIC,
- CLOCK_PROCESS_CPUTIME_ID
- and the pattern observed was the same...

#include <time.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define min(a, b)  ((a) < (b) ? (a) : (b))
#define max(a, b)  ((a) > (b) ? (a) : (b))

uint64_t
time_now()
{
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    return ts.tv_sec * 1000000000 + ts.tv_nsec;
}

void
set_affinity(int cpu)
{
    cpu_set_t set;
    CPU_ZERO(&set);
    CPU_SET(cpu, &set);

    if (sched_setaffinity(0, sizeof(set), &set))
    {
            perror("sched_setaffinity");
    }
}


#define NUM    1000000
#define LOOPS  10

int
main(int argc, char **argv)
{
    set_affinity(3);

    for (int loop = 0; loop < LOOPS; ++ loop) 
    {
        uint64_t t_0 = time_now();

        uint64_t sum_val = 0;
        uint64_t max_val = 0;
        uint64_t min_val = uint64_t(-1);

        for (int k = 0; k < NUM; ++ k)
        {
            uint64_t t_1 = time_now();
            uint64_t t_diff = t_1 - t_0;

            sum_val += t_diff;
            min_val = min(t_diff, min_val);
            max_val = max(t_diff, max_val);

            t_0 = t_1;
        }

        printf("Mean: %.2f, Min: %.2f, Max: %.2f\n", ((double )sum_val)/NUM/1000, ((double )min_val)/1000, ((double)max_val)/1000);
    }

    return 0;
}

It's better to use asm "rdtsc" instruction for precise time measurement — eungenue, Jun 20 '16 at 11:30

score 0 · Answer 1 · answered Jun 09 '16 at 04:05

Two sources of unpredictability are going to be from device interrupts and timers. While you have set the affinity of userspace processes, interrupts from devices will be occurring and will affect your process. The kernel will also use timers to occur at periodic ticks so it can keep track of time. I would say these are going to be your first two main sources of unpredictability. Inter-Process Interrupts (IPIs) used for cores to signal each other are going to be another but probably not as high as the first two.

Why strange timing results appear in spite of binding a thread to a specific CPU-core?

1 Answers1