-1

Background: The other day, a colleague of mine reported that he faced severe CPU load on his computer with our product (a Windows service created with C#). In the support forum for a ThirdParty component used in our software, he stumbled upon the environment variable OMP_WAIT_POLICY which should be set to PASSIVE. According to his words, that variable was specific to ThirdParty. It helped him cut CPU load by half.

I could hardly believe that because ThirdParty is responsible for less than a fifth of the CPU load of our product. I tested it on my machine, and voilà, the CPU load fell by half.

Now I am trying to find out what happens here. Since there are some Google results for OMP_WAIT_POLICY, it is obvious that this environment variable is not at all specific for ThirdParty. According to the GNU documentation, this variable means If the value is PASSIVE, waiting threads should not consume CPU power while waiting.

Since our application is heavily multi-threaded, also with many wait times in the threads between receiving fresh data sets from hardware, such a change should have influence. But that would require that the underlying implementation of .NET is sensitive to that variable - and I failed to find any documentation for that.

At what level of a Windows system does OMP_WAIT_POLICY work?

Bernhard Hiller
  • 2,163
  • 2
  • 18
  • 33

2 Answers2

3

First OpenMP is very far away from .net - in fact this is the only open question on SO tagged both. So this third party component uses native compiled code (C, C++ or Fortran). This component uses an OpenMP runtime, for instance libgomp. The OpenMP runtime manages the threads used by the third party code - the threads are likely native OS threads. Now the OpenMP runtime may assume that it can run one thread per logical core. In scientific computing, OpenMP is usually run such that there is a exclusively dedicated core for each thread. But in your case, it is conceivable, that the .net software and OpenMP runtime share cores.

How the OMP_WAIT_POLICY influences the synchronization of among the OpenMP runtime threads used by the third party component. An active policy implies that the threads use the CPU while waiting. This results in a lower latency when when resuming work and, at least from a performance point of view, is often fine if the cores are exclusively used by this thread. In your context, where it can be decremental to performance, e.g. if .net wants to run something on the cores where OpenMP just actively waits. In a server context where you strive to minimize CPU utilization, you definitely want to use the passive policy.

Note that the default behavior is implementation defined, for libgomp, it is documented to be active for a certain amount of time, and then switch to passive. This time can be tuned via GOMP_SPINCOUNT. If you see decremental performance with passive, try to use a lower value of GOMP_SPINCOUNT instead.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • That means you think the reduction of CPU usage in our application is due to a reduced interference with ThirdParty such that our application can use the CPU cores while ThirdParty is in a waiting state? That would amount to an astonishingly big interference, because ThirdParty requires far less CPU than the rest of our application (well, that was determined by looking at the task manager only, not a real performance measurement). – Bernhard Hiller Jun 21 '17 at 12:21
  • There is no way to discuss more specifically what happens in your scenario without a much more detailed description of methodical observation on your system. – Zulan Jun 21 '17 at 13:32
  • @BernhardHiller, most OpenMP runtimes implement barriers using spin waiting. The alternative is to use OS-specific events. Since those are typically kernel objects, there is a much higher latency involved. Thus, some OpenMP runtimes like GOMP take a hybrid approach - spin-wait for a certain amount of spins, then switch to using OS events. – Hristo Iliev Jun 21 '17 at 14:40
1

It's a typical problem of openmp not cooperating with another threading model. Openmp keeps a logical processor captive until spincount expires. If running on a platform where libgomp supports omp_places=cores that may help.

tim18
  • 580
  • 1
  • 4
  • 8
  • Omp_places=cores with corresponding num_threads gives that other threading model some resources even before spincount is consumed. – tim18 Jun 21 '17 at 13:15
  • That is, also you think that Microsoft did not build any genuine sensitivity to that environment variable into their .Net framework (at least some parts of it need to be compiled natively), and it is only the non-cooperativeness with other threading models which causes those enormous CPU hits. By the way, according to https://stackoverflow.com/questions/34852911/set-thread-affinity-on-two-cores-using-openmp "OpenMP thread affinity is not supported on Windows 7 machines". – Bernhard Hiller Jun 22 '17 at 08:36
  • Neither libgomp nor Microsoft openmp supports affinity on windows. But the problem of non-cooperating threading models is more widespread e.g. between openmp and cilkplus. – tim18 Jun 22 '17 at 17:55