2

How can I constrain all my benchmarks to run on a single CPU (e.g. C0)? I am running benchmarks and want to expose my tests to an environment similar to the target. I would also like an advice as to how I could ensure that there are minimal other processes running as I run the benchmarks.

import std.datetime;
import std.stdio;

void algorithm() {
    writeln("Hello!");
}

void main() {
    StopWatch stopWatch;
    stopWatch.start();

    algorithm();

    stopWatch.stop();
    auto duration = stopWatch.peek();
    writeln("Hello Duration ==> ", duration);
}
Roman
  • 6,486
  • 2
  • 23
  • 41
Walker
  • 323
  • 1
  • 12
  • Can you add more details about the test environment and/or if there are any types of parallelism that could be used in `algorithm()`. Assuming a single node environment there are a number of ways for setting CPU affinity in threaded code. In multi-node settings, e.g. using MPI, process affinity can typically be set when launching or through a resource manager. – Matt Jul 23 '15 at 18:07
  • google for affinity mask – Andriy Tylychko Jul 23 '15 at 18:09
  • @Matt there is no parallelism in algorithm() , it is just a simple string matching algorithm. Am actually trying to compare three such algorithms so I know which one is faster for strings of a particular type( length, repeating chars, etc) . My test is done on Linux Mint 64-bit and I have 4 processors , and interested in having all algorithms get executed on same processor. I develop in monodevelop and manually run in a shell. I dont know if this supplies averything you asked, am relatively new to this world of clean benchmarking. – Walker Jul 23 '15 at 20:08
  • That is helpful thanks, in the past when I have done this sort of testing I used `numactl` to bind tasks to CPUs and cores. Reducing background noise can be achieved by shutting off unnecessary services, unloading unnecessary drivers. Depending on the type of code being run there could also be BIOS and kernel settings that could be changed to improve performance and reduce unnecessary background tasks. Power and performance settings in BIOS would probably have most of the relevant options, though the use of built-in virtualization and/or hyper threading can have an impact as well – Matt Jul 23 '15 at 20:16
  • Thanks, I just had some read at the reference to https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251(v=vs.85).aspx.It affirms that for choosing the processor to execute: numactl --physcpubind=0 would be ideal . The other parameter I may have to investigate could be whether I could have means of controlling that for all the algorithms static variables ( strings being compared ) are stored at a constant location in RAM per run. Am aware the OS is the one responsible for this but am wondering the level of control I may have here as well. – Walker Jul 23 '15 at 21:48

2 Answers2

4

Have you tried using numactl? It is very useful for memory and process binding here is a link to the man page.

For example

numactl --physcpubind=0 myapp args

Will bind he process myapp to core 0.

Depending on what exactly you want to do there may be different syntax. For example specifying specific cores on a CPU or memory binding. The format of arguments for your application may also impact the numactl syntax.

As for reducing the number of other processes there are several options, but the specifics are somewhat OS dependent. If you really want to test the system in an environment with minimal background noise you could design a custom OS image with only the minimal packages required to turn the node on and run the benchmarks. This approach is similar to the one employed in many modern HPC clusters. if you have multiple servers available using a cluster management tool like Warewulf might be useful, there are many reference designs and recipes available online for building a small cluster.

Other options include turning off any background and unnecessary programs and applications. You can also switch off unnecessary services and unload unused kernel modules.

Some power and performance setting in BIOS may also have an impact. Settings related to power consumption may impact things like frequency scaling and throttling, which can sometimes create unpredictable results during performance tests. Those factors generally impact workloads that produce large numbers of floating-point operations, but can be extended to any CPU intensive operation.

Understanding the constraints of the problem is very important when profiling code. Knowing if the code is CPU bound, memory bound or IO bound can make a big difference in the tools used to profile as well as the optimization techniques that can be used.

Matt
  • 545
  • 3
  • 16
  • Just curious if you know. Is there anything similar on Windows? – Bauss Jul 23 '15 at 17:41
  • 1
    I rarely use Windows systems and have never needed to do this sort of performance testing on one, but I was able to find [some info from Microsoft](https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251(v=vs.85).aspx) about NUMA support, so it seems like something that is possible – Matt Jul 23 '15 at 17:44
0

set affinity mask for your main (single?) thread: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx

Andriy Tylychko
  • 15,967
  • 6
  • 64
  • 112