33

I want to create a C++11 thread which I want it to run on my first core. I find that pthread_setaffinity_np and sched_setaffinity can change the CPU affinity of a thread and migrate it to the specified CPU. However this affinity specification changes after the thread has run.

How can I create a C++11 thread with specific CPU affinity (a cpu_set_t object)?

If it is impossible to specify the affinity when initializing a C++11 thread, how can I do it with pthread_t in C?

My environment is G++ on Ubuntu. A piece of code is appreciated.

Carlos
  • 5,991
  • 6
  • 43
  • 82
Peng Zhang
  • 3,475
  • 4
  • 33
  • 41
  • 3
    Sorry to say I don't think C++11 supports this (presumably due to portability concerns) - you may have to ditch `std::thread` and start the thread with `pthread_create` and an attribute you've prepared with [`pthread_attr_setaffinity_np`](http://man7.org/linux/man-pages/man3/pthread_attr_setaffinity_np.3.html), or use `std::thread` and have the created thread immediately set its own affinity (avoiding the race condition you'd have if you tried to set it from the creating thread). – Tony Delroy Jul 09 '14 at 05:42
  • @TonyD Thanks a lot. I added the code in the answer. Hope that is what you suggested. – Peng Zhang Jul 09 '14 at 06:38
  • looks about right... cheers. – Tony Delroy Jul 09 '14 at 06:39

4 Answers4

39

I am sorry to be the "myth buster" here, but setting thread affinity has great importance, and it grows in importance over time as the systems we all use become more and more NUMA (Non-Uniform Memory Architecture) by nature. Even a trivial dual socket server these days has RAM connected separately to each socket, and the difference in access to memory from a socket to its own RAM to that of the neighboring processor socket (remote RAM) is substantial. In the near future, processors are hitting the market in which the internal set of cores is NUMA in itself (separate memory controllers for separate groups of cores, etc). There is no need for me to repeat the work of others here, just look for "NUMA and thread affinity" online - and you can learn from years of experience of other engineers.

Not setting thread affinity is effectively equal to "hoping" that the OS scheduler will handle thread affinity correctly. Let me explain: You have a system with some NUMA nodes (processing and memory domains). You start a thread, and the thread does some stuff with memory, e.g. malloc some memory and then process etc. Modern OS (at least Linux, others probably too) do a good job thus far, the memory is, by default, allocated (if available) from the same domain of the CPU where the thread is running. Come time, the time-sharing OS (all modern OS) will put the thread to sleep. When the thread is put back into running state, it may be made runnable on any of the cores in the system (as you did not set an affinity mask to it), and the larger your system is, the higher the chance it will be "woken up" on a CPU which is remote from the memory it previously allocated or used. Now, all your memory accesses would be remote (not sure what this means to your application performance? read more about remote memory access on NUMA systems online)

So, to summarize, affinity setting interfaces are VERY important when running code on systems that have more-than-trivial architecture -- which is rapidly becoming "any system" these days. Some thread runtime environments/libs allow for control of this at runtime without any specific programming (see OpenMP, for example in Intel's implementation of KMP_AFFINITY environment variable) - and it would be the right thing for C++11 implementers to include similar mechanisms in their runtime libs and language options (and until then, if your code is aimed for use on servers, I strongly recommend that you implement affinity control in your code)

Benzi Galili
  • 499
  • 4
  • 4
  • 1
    +1 Windows is a good example of how "hoping the OS will do it right" can go wrong. At least under Win7 (I haven't tried 8 or 10) threads end up assigned a preferred core round-robin. Which is very simple and just good enough 95% of the time. But the remaining 5% it's really, really bitter. – Damon Nov 17 '14 at 15:51
  • This only applies to processes with easily predictable work-loads. You will need to be much smarter to beat near-future schedulers and their optimizers which are more and more likely to gain performance from recent advances in AI. – Domi Dec 23 '16 at 15:36
  • 10
    This appears in no way to answer the question. – Tommy Jul 25 '19 at 21:01
26

Yes, there are way to make it. I came across this method on this blog link

I rewrite the code on the blog of Eli Bendersky, and the link was pasted above. You can save the code below to test.cpp and compile & run it :

 // g++ ./test.cpp  -lpthread && ./a.out
// 
#include <thread>
#include <vector>
#include <iostream>
#include <mutex>
#include <sched.h>
#include <pthread.h>
int main(int argc, const char** argv) {
  constexpr unsigned num_threads = 4;
  // A mutex ensures orderly access to std::cout from multiple threads.
  std::mutex iomutex;
  std::vector<std::thread> threads(num_threads);
  for (unsigned i = 0; i < num_threads; ++i) {
    threads[i] = std::thread([&iomutex, i,&threads] {
      // Create a cpu_set_t object representing a set of CPUs. Clear it and mark
      // only CPU i as set.
      cpu_set_t cpuset;
      CPU_ZERO(&cpuset);
      CPU_SET(i, &cpuset);
      int rc = pthread_setaffinity_np(threads[i].native_handle(),
                                      sizeof(cpu_set_t), &cpuset);
      if (rc != 0) {
        std::cerr << "Error calling pthread_setaffinity_np: " << rc << "\n";
      }
      std::this_thread::sleep_for(std::chrono::milliseconds(20));
      while (1) {
        {
          // Use a lexical scope and lock_guard to safely lock the mutex only
          // for the duration of std::cout usage.
          std::lock_guard<std::mutex> iolock(iomutex);
          std::cout << "Thread #" << i << ": on CPU " << sched_getcpu() << "\n";
        }

        // Simulate important work done by the tread by sleeping for a bit...
        std::this_thread::sleep_for(std::chrono::milliseconds(900));
      }
    });


  }

  for (auto& t : threads) {
    t.join();
  }
  return 0;
}

Y00
  • 666
  • 1
  • 7
  • 23
3

In C++ 11 you cannot set the thread affinity when the thread is created (unless the function that is being run in the thread does it on its own), but once the thread is created, you can set the affinity via whatever native interface you have by getting the native handle for the thread (thread.native_handle()), so for Linux you can get the pthread id via:

pthread_t my_thread_native = my_thread.native_handle();

Then you can use any of the pthread calls passing in my_thread_native where it wants the pthread thread id.

Note that most thread facilities are implementation specific, i.e. pthreads, windows threads, native threads for other OSes all have their own interface and types this portion of your code would not be very portable.

diverscuba23
  • 2,165
  • 18
  • 32
-11

After searching for a while, it seems that we cannot set CPU affinity when we create a C++ thread.

The reason is that, there is NO NEED to specify the affinity when create a thread. So, why bother make it possible in the language.

Say, we want the workload f() to be bound to CPU0. We can just change the affinity to CPU0 right before the real workload by calling pthread_setaffinity_np.

However, we CAN specify the affinity when create a thread in C. (thanks to the comment from Tony D). For example, the following code outputs "Hello pthread".

void *f(void *p) {
  std::cout<<"Hello pthread"<<std::endl;
}

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(0, &cpuset);
pthread_attr_t pta;
pthread_attr_init(&pta);
pthread_attr_setaffinity_np(&pta, sizeof(cpuset), &cpuset);
pthread_t thread;
if (pthread_create(&thread, &pta, f, NULL) != 0) {
    std::cerr << "Error in creating thread" << std::endl;
}
pthread_join(thread, NULL);
pthread_attr_destroy(&pta);
Peng Zhang
  • 3,475
  • 4
  • 33
  • 41
  • 5
    Bad answer. You are using bold caps to make overly-wide assertions about use cases. pthread_setaffinity_np changes the thread of the currently running process, causing an unnecessary context switch. – ACyclic Oct 21 '15 at 23:06
  • @ACyclic, what do you mean "changes the thread of the currently running process"? Also, note, from the man page: The pthread_setaffinity_np() function sets the CPU affinity mask of the thread thread to the CPU set pointed to by cpuset – aho Jun 21 '16 at 01:25
  • "The reason is that, there is NO NEED to specify the affinity when create a thread. So, why bother make it possible in the language."... so amusing. – Alex Jul 12 '16 at 00:20
  • 3
    @aho I'll clarify a use case. There are systems such as embedded or real time priority systems where pre-emptive multithreading is not possible or desirable. This C++ thread api requires the thread to start on the same hardware core/hyperthread as the spawning process (technically, it doesn't allow you to pre-specify the CPU affinity before starting the thread). Thus it assumes the availability of pre-emptive multithreading. If the current thread is running with real-time priority, the new thread will never start. If this restriction doesn't apply to you, then use the C++ thread API. – ACyclic Aug 26 '16 at 15:44
  • With the edit this answer becomes far more useful. It's the only one allowing to specify the cpu affinity before thread creation, something that can alleviate a possible context switch, as well as a potential race condition. – lennartVH01 Jul 24 '22 at 14:04