How to ensure that std::thread are created in multi core?

Question

I am using visual studio 2012. I have a module, where, I have to read a huge set of files from the hard disk after traversing their corresponding paths through an xml. For this i am doing

std::vector<std::thread> m_ThreadList;

In a while loop I am pushing back a new thread into this vector, something like

m_ThreadList.push_back(std::thread(&MyClass::Readfile, &MyClassObject, filepath,std::ref(polygon)));

My C++11 multi threading knowledge is limited.The question that I have here , is , how do create a thread on a specific core ? I know of parallel_for and parallel_for_each in vs2012, that make optimum use of the cores. But, is there a way to do this using standard C++11?

The C++ thread functions have no knowledge of "cores", and therefore you can't bind a thread to a specific core. — Some programmer dude, Apr 04 '13 at 04:04
I don't really know, but I am guessing [`SetThreadAffinityMask`](http://msdn.microsoft.com/en-us/library/ms686247(v=vs.85).aspx) and the handle returned by [`std::thread::native_handle()`](http://en.cppreference.com/w/cpp/thread/thread/native_handle) might be the closest you can get. (But I agree that there is no way to do this purely within the C++11 Standard, i.e. without platform-specific calls.) — jogojapan, Apr 04 '13 at 04:04
And as a general tip, if you use multiple threads to read from multiple files on the same filesystem, you most likely will make the program _slower_ due to the operating system having to seek back and forth every time there is a thread context switch. (At least on old mechanical hard-drives, modern SSDs should handle this better.) — Some programmer dude, Apr 04 '13 at 04:05
Do you really mean *"a specific core"*? Such as "I want this thread to specifically be on core #2"? — Drew Dormann, Apr 04 '13 at 04:10
@ Drew Dormann :: I have four cores, Ideally, I want the threads to be created on that core which is less utilized — Atul, Apr 04 '13 at 04:11
@Atul if you want to manage that, search around for "thread affinity" — Drew Dormann, Apr 04 '13 at 04:14
@ Joachim Pileborg :: What is the efficient way that you suggest in this case ? — Atul, Apr 04 '13 at 04:14
[std::thread::hardware_concurrency](http://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) could help you. — Mark Garcia, Apr 04 '13 at 04:21
If this is really just about *"I don't want my 4 threads to all end up on the same core"*. Then the easy answer is *"C'mon you're using an operating system that is perfectly aware of multi-core processors and isn't that dumb"*. But if the question is *"I want thread 2 to be run on core 3"* (for whatever reason this should be neccessary on a multi-core in a *"normal"* application), then you're up to the mercy of platform-dependent functionality. — Christian Rau, Apr 04 '13 at 07:49
By the way, so if you have 100 files you're gonna start 100 threads? Good luck. — Christian Rau, Apr 04 '13 at 07:52
'core which is less utilized' - since this can change in an unpredictable way at almost any time, you will have a problem. Just leave it to the OS, as others have suggested. — Martin James, Apr 04 '13 at 09:25
The OpenMP library is suitable for controlling threads/cores. — Sorush, Nov 01 '21 at 23:03

score 5 · Accepted Answer · answered Apr 04 '13 at 09:30

As pointed out in other comments, you cannot create a thread "on a specific core", as C++ has no knowledge of such architectural details. Moreover, in the majority of cases, the operating system will be able to manage the distribution of threads among cores/processors well enough.

That said, there exist cases in which forcing a specific distribution of threads among cores can be beneficial for performance. As an example, by forcing a thread to execute onto a one specific core it might be possible to minimise data movement between different processor caches (which can be critical for performance in certain memory-bound scenarios).

If you want to go down this road, you will have to look into platform-specific routines. E.g., for GNU/linux with POSIX threads you will want pthread_setaffinity_np(), in FreeBSD cpuset_setaffinity(), in Windows SetThreadAffinityMask(), etc.

I have some relevant code snippets here if you are interested:

http://gitorious.org/piranhapp0x/mainline/blobs/master/src/thread_management.cpp

@Atul, Let us know whether you do actually get a performance improvement over just letting the OS do the allocation for itself. — bazza, Apr 05 '13 at 04:17
It looks like the new link for the "relevant code snippets" are here on github now: https://github.com/bluescarni/piranha/blob/master/include/piranha/thread_management.hpp — rob3c, Apr 15 '18 at 05:30

bazza · Answer 2 · 2017-12-07T06:23:36.413

I'm fairly certain that core affinity isn't included in std::thread. The assumption is that the OS is perfectly capable of making best possible use of the cores available. In all but the most extreme of cases you're not to going to beat the OS's decision, so the assumption is a fair one.

If you do go down that route then you have to add some decision making to your code to take account of machine architecture to ensure that your decision is better than the OSes on every machine you run on. That takes a lot of effort! For starters you'll be wanting to limit the number of threads to match the number of cores on the computer. And you don't have any knowledge of what else is going on in the machine; the OS does!

Which is why thread pools exist. They tend by default to have as many threads as there are cores, automatically set up by the language runtime. AFAIK C++11 doesn't have one of those. So the one good thing you can do to get the optimum performance is to find out how many cores there are and limit the number of threads you have to that number. Otherwise it's probably just best to trust the OS.

Joachim Pileborg's comment is well worth paying attention to, unless the work done by each thread outweighs the I/O overhead.

Is it fine, if after populating the vector, i run parallel_for_each loop to join the threads ? — Atul, Apr 04 '13 at 06:28
@Atul: almost certainly not. the action that you take in the function_object of a parallel_for_each (or parallel_for) should, in general, not perform synchronization operations with other threads. In fact, parallel_for and parallel_for_each usually get quite confused (perform much worse) if you are doing any thread creation of your own at all. — Wandering Logic, Apr 04 '13 at 12:33
... you might, on the other hand, consider just making a vector of objects (containing the filepaths you need processed) and then using parallel_for_each over the vector. parallel_for_each _is_ carefully designed to use a thread pool with an optimal number of threads and carefully designed affinity optimizations. — Wandering Logic, Apr 04 '13 at 12:35

score 0 · Answer 3 · answered Jun 16 '16 at 22:19

As a quick overview of threading in the context of dispatching threads to cores:

Most modern OS's make use of kernel level threads, or hybrid. With kernel level threading, the OS "sees" all the threads in each process; in contrast to user level threads, which are employed in Java, where the OS sees a single process, and has no knowledge of threading. Now, because, with kernel level threading, the OS can recognise the separate threads of a process, and manages their dispatch onto a given core, there is the potential for true parallelism - where multiple threads of the same process are run on different cores. You, as the programmer, will have no control over this however, when employing std::thread; the OS decides. With user level threading, all the management of threads are done at the user level, with Java, a library manages the "dispatch". In the case of hybrid threading, kernel threading is used, where each kernel thread is actually a set of user level threads.

How to ensure that std::thread are created in multi core?

3 Answers3

Linked