1

I'm using a raspberry pi and I need really fast performance from my CPU for a certain process.

To achieve that, I added isolcpus=3 to my kernel boot parameters, to isolate the core for this process only.

From looking at /proc/interrupts, it seems that this core irqs are also minimal (after isolation).

Now, I'm running this code on the isolated CPU (taskset -p 8 PID):

for (i=0; i<254; i++) {
    clock_gettime(CLOCK_REALTIME, &start);
    for (rep=0; rep<10000000; rep++) {
    }
    clock_gettime(CLOCK_REALTIME, &end);
    timespec_diff(&start, &end, &diff);
    printf("%d\n", diff.tv_nsec);
}

The output is see is:

133562686, 133525447, 133536802, 133525760, 133540134, 133555290, 133540135, 133542218, 133525552, 133524979, 133577791, 133523208, 133525604, 133545916, 87085933, 66719079, 66719339, 66726787, 66719912, 66718870, 66712048, 76724670, 133535917, 133525396, 133528260, 133578416, 133522740, 133525552, 133541177, 133526021, 133553677, 133541906

This is only part of the output. The time is usually consistent on ~133525760, but sometimes it gets faster for a little while, by a multiply of 2.

The tasks running on core 3 are:

  PID   TID CLS RTPRIO  NI PRI PSR %CPU STAT WCHAN          COMMAND
   22    22 TS       -   0  19   3  0.0 S    -              cpuhp/3
   23    23 FF      99   - 139   3  0.0 S    -              migration/3
   24    24 TS       -   0  19   3  0.0 S    -              ksoftirqd/3
   25    25 TS       -   0  19   3  0.0 S    -              kworker/3:0
   26    26 TS       - -20  39   3  0.0 S<   -              kworker/3:0H
 1158  1158 TS       - -20  39   3  0.0 S<   -              kworker/3:1H
 1159  1159 TS       -   0  19   3  0.0 S    -              kworker/3:1
 5907  5907 TS       -   0  19   3 99.1 R    -              a.out

According to ps, the usage percentage of my process varies between 99 to 100 percent of the CPU (which I also don't understand why it is not consistent on 100%), so the fact that the time is divided by 2 doesn't make sense.

Both speeds are good enough for me, I just need it to be consistent. Does anyone have an idea why could this happen? Is there any way I can make my loop time consistent?

phuclv
  • 37,963
  • 15
  • 156
  • 475
user1364700
  • 161
  • 1
  • 1
  • 4
  • try to ask here: https://raspberrypi.stackexchange.com/ – janfitz Dec 18 '17 at 13:27
  • 1
    Maybe the low value is normal and higher value is when other CPU cores are idle. Run 1 or 2 additional processes which do infinite loops and check again. – i486 Dec 18 '17 at 13:36
  • @janfitz i will try if i wont get an answer here. thanks – user1364700 Dec 18 '17 at 13:38
  • @i486 i tried to run a second process with an inifinite loop but it still heppens. nice idea though. – user1364700 Dec 18 '17 at 13:55
  • Just a shot in the dark. Could this be some branch prediction issue? Also, the compiler might choose to eliminate the inner loop completely. Always, worth thinking about such internals when doing a quick in dirty benchmark – lwi Dec 19 '17 at 13:59
  • @lwi i forgot to mention i compiled the code with -O0 flag for no optimization. I also tried to replace the loop code by writing "asm(add r3, r3, #1") 10000 times. still the same – user1364700 Dec 24 '17 at 10:45

0 Answers0