2

I have written a small test program in which I try to use the Windows API call SetThreadAffinityMask to lock the thread to a single NUMA node. I retrieve the CPU bitmask of a node with the GetNumaNodeProcessorMask API call, then pass that bitmask to SetThreadAffinityMask along with the thread handle returned by GetCurrentThread. Here is a greatly simplified version of my code:

// Inside a function called from a boost::thread
unsigned long long nodeMask = 0;
GetNumaNodeProcessorMask(1, &nodeMask);
HANDLE thread = GetCurrentThread();
SetThreadAffinityMask(thread, nodeMask);
DoWork(); // make-work function

I of course check whether the API calls return 0 in my code, and I've also printed out the NUMA node mask and it is exactly what I would expect. I've also followed advice given elsewhere and printed out the mask returned by a second identical call to SetThreadAffinityMask, and it matches the node mask.

However, from watching the resource monitor when the DoWork function executes, the work is split among all cores instead of only those it is ostensibly bound to. Are there any trip-ups I may have missed when using SetThreadAffinityMask? I am running Windows 7 Professional 64-bit, and the DoWork function contains a loop parallelized with OpenMP which performs operations on the elements of three very large arrays (which combined are still able to fit in the node).

Edit: To expand on the answer given by David Schwartz, on Windows any threads spawned with OpenMP do NOT inherit the affinity of the thread which spawned them. The problem lies with that, not SetThreadAffinityMask.

ahelwer
  • 1,441
  • 13
  • 29

1 Answers1

2

Did you confirm that the particular thread whose affinity mask was running on a core in another numa node? Otherwise, it's working as intended. You are setting the processor mask on one thread and then observing the behavior of a group of threads.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Yes. I only spawned one thread off of the main with boost::threads, which called the function which bound it to the NUMA nodes then called DoWork(). Inside DoWork() OpenMP threads were spawned, which should have the same affinity as the parent thread. But the single parent thread which was supposed to have been bound to 6/24 cores used all 24 cores when inside DoWork(). – ahelwer Jan 24 '12 at 00:14
  • 1
    There is no such thing as a "parent thread". Threads are process resources. Likely your OpenMP implementation isn't creating threads but reusing existing ones. Check the thread whose mask you set -- I'll bet you that *it* is where you told it to be. – David Schwartz Jan 24 '12 at 00:16
  • From the OpenMP standard: "parent thread: The thread that encountered the parallel construct and generated a parallel region is the parent thread of each of the threads in the team of that parallel region. The master thread of a parallel region is the same thread as its parent thread with respect to any resources associated with an OpenMP thread." Hmm, so maybe thread 0 is the only one with affinity set. I'll check. – ahelwer Jan 24 '12 at 00:19
  • I mean to the scheduler there's no such thing. There's no guarantee that the threads are created just for the construct. – David Schwartz Jan 24 '12 at 00:20
  • Or, to phrase it better, the binding between threads and teams is dynamic. Also, the relationship between OpenMP threads and OS threads is unspecified. – David Schwartz Jan 24 '12 at 00:23
  • All right. So I spawned a bunch of boost threads, bound them all to the same node, and removed OpenMP parallelization. The threads all execute within the cores they are allowed. From this we can infer that OpenMP threads in Windows do NOT inherit the affinity mask of the thread which spawns them. This does happen in Linux, so I was expecting it to happen in Windows as well. – ahelwer Jan 24 '12 at 00:34