8

When I run my multi-threaded code, the system (linux) sometimes moves the threads from one processor to another. As I have as many threads as I have processors, it invalidates caches for no good reasons and it confuses my tracing activities.

Do you know how to bind threads to processors, and why does a system would do this ?

Ben
  • 7,372
  • 8
  • 38
  • 46
  • 1
    note that a "do_not_migrate" thread attribute would work also ... – Ben Sep 22 '09 at 08:46
  • Collecting jobs on one processor and leaving the other processor without work may give better power saving. – sambowry Sep 22 '09 at 11:34
  • @sambowry : it run on a 24 processor's machine, I would be quite a waste of energy to use only 1 core out of 24 and keep the machine up during 24 time longer :/ – Ben Sep 22 '09 at 13:11
  • Emmm, I just wonder how you see "the system (linux) sometimes moves the threads from one processor to another", could you tell me? thanks a lot. – Hu Xixi Aug 10 '20 at 09:18

2 Answers2

20

Use sched_setaffinity (this is Linux-specific).

Why would a scheduler switch threads between different processors? Well, imagine that your thread last ran on processor 1 and is currently waiting to be scheduled for execution again. In the meantime, a different thread is currently running on processor 1, but processor 2 is free. In this situation, it's reasonable for the scheduler to switch your thread to processor 2. However, a sophisticated scheduler will try to avoid "bouncing" a thread between processors more than necessary.

Martin B
  • 23,670
  • 6
  • 53
  • 72
  • I would expect the scheduler to avoid this if there is less threads than processors ... – Ben Sep 22 '09 at 08:58
  • 1
    That's true... you said in your question, though, that you have "many more threads than processors" -- was that supposed to be the other way round? – Martin B Sep 22 '09 at 09:04
  • Not it is because knittl edit my question without knowing what I was talking about, I have as many threads as I have processors – Ben Sep 22 '09 at 09:42
  • 6
    @Ben: unless you have written your own OS, there are not "less threads than processors". Other things on the system are continually taking timeslices, and this can cause some or all of your threads to be descheduled. When they come to be rescheduled, the core on which they last ran may or may not be available. – Steve Jessop Sep 22 '09 at 09:46
  • 1
    @Ben: (+onebyone) For the scheduler, the response time is as important as the cache efficiency. Note that by constraining the system to cpu cores, the response time may go up a lot. Also the system will be less portable. Check if this is worth the gains. – Adriaan Sep 22 '09 at 09:50
9

You can do this from bash. There is a wonderful taskset command I acquainted in this question (you may also find valuable discussion on how scheduler should operate there). The command takes a pid of a process and binds it to the specific processor(s).

taskset -c 0 -p PID

binds the process with PID to processor (core) number 0.

What does it have to do with threads? To each thread is assigned an identifier with the same rights as pid, also known as "tid". You can get it with gettid syscall. Or you can watch it, for example, in top program by pressing H (some processes will split to many seemingly equal entries with different pids---those are threads).

Community
  • 1
  • 1
P Shved
  • 96,026
  • 17
  • 121
  • 165