I have a problem with experiment on my computer. I've done 300 tests of parallel algorithm (32 threads) and seen, that runtime of about 10% tests is less than others. It looks like that: we have 100 tests with runtime of each about 100 ms, then we have 30 tests with runtime ~ 80 ms and again 170 tests with runtime ~100 ms. It happens every experiment. I used OpenMP, TBB, PTHREAD, std::Thread and it happens with every parallel technology. What's the reason of that?
CPU: Intel® Core™ i7 Kaby Lake H 2800 - 3800 MHz Cores: 4 Threads: 8