3

Suppose I have a multi-threaded application (say ~40 threads) running on a multiprocessor system (say 8 cores) with Linux as the operating system where different threads are more essentially LWP (Light Weight Processes) being scheduled by the kernel.

What would be benefits/drawbacks of using the CPU affinity? Whether CPU affinity is going to help by localizing the threads to a subset of cores thus minimizing cache sharing/misses?

Vivek Gupta
  • 213
  • 4
  • 10
  • You mean 40 threads that are continually ready. I don't have any real apps where this happens, (except maybe overloaded DB servers in the field and I don't think that twiddling affinity is going to help there). Why not try it with a test app and tell us? – Martin James Jan 05 '13 at 09:06
  • You say `multiprocessor system (say 8 cores)`. Is it one processor with 8 cores or is it 2 processors at 4 cores each? The number of logical processors (cores) may be 8 in both cases. By saying CPU affinity you consider to set the affinity of a thread/process to a specific core, or? Please clarify. – Arno Jan 05 '13 at 10:59

3 Answers3

2

If you use strict affinity, then a particular thread MUST run on that processor (or set of processors). If you have many threads that work completely independently, and they work on larger chunks of memory than a few kilobytes, then it's unlikely you'll benefit much from running on one particular core - since it's quite possible the other threads running on this particular CPU would have thrown out any L1 cache, and quite possibly L2 caches too. Which is more important for performance - cahce content or "getting to run sooner"? Are some CPU's always idle, or is the CPU load 100% on every core?

However, only you know (until you tell us) what your threads are doing. How big is the "working set" (how much memory - code and data) are they touching each time they get to run? How long does each thread run when they are running? What is the interaction with other threads? Are other threads using shared data with "this" thread? How much and what is the pattern of sharing?

Finally, the ultimate answer is "What makes it run faster?" - an answer you can only find by having good (realistic) benchmarks and trying the different possible options. Even if you give us every single line of code, running time measurements for each thread, etc, etc, we could only make more or less sophisticated guesses - until these have been tried and tested (with VARYING usage patterns), it's almost impossible to know.

In general, I'd suggest that having many threads either suggest that each thread isn't very busy (CPU-wise), or you are "doing it wrong"... More threads aren't better if they are all running flat out - better to have fewer threads in that case, because they are just going to fight each other.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
1

The scheduler already tries to keep threads on the same cores, and to avoid migrations. This suggests that there's probably not a lot of mileage in managing thread affinity manually, unless:

  • you can demonstrate that for some reason the kernel is doing a bad a job for your particular application; or
  • there's some specific knowledge about your application that you can exploit to good effect.
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • i think actually scheduler tries not to overheat cores so it actually tries to move thread between cores (if you run one busy-waiting thread it will migrate between many cores, instead of taking 100% of one core) – Oleg Vazhnev Sep 21 '14 at 16:12
1

localizing the threads to a subset of cores thus minimizing cache sharing/misses

Not necessarily, you have to consider cache coherence too, if two or more threads access a shared memory buffer and each one is bound to a different CPU core their caches have to be synchronized if one thread writes to a shared cache line there will be a significant overhead to invalidate other caches.

iabdalkader
  • 17,009
  • 4
  • 47
  • 74