0

I have a code in C which has the following overall framework:

while (err > tol){
    func_A();
    func_B();
    func_C();
    func_Par();
}

The codes are changing some global variables and this how they are connected. In func_Par(), three threads are created. All threads are using the same function, namely Threads_Func(). Based on number of thread, following code is used in Threads_Func() to change cpu affinity for each thread:

pthread_t curThread = pthread_self();
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(thread_number, &cpuset);
pthread_setaffinity_np(curThread, sizeof(cpu_set_t), &cpuset);

Here is the strange behaviour that I cannot explain. I am measuring the cpu time for func_A, func_B and func_C and here are the results (all results in micro seconds):

With setting CPU affinity in Threads_Func():

func_A: 439197
func_B: 61129
func_C: 400482
func_Par: 2488662

Without setting CPU affinity in Threads_Func():

func_A: 226677
func_B: 30922
func_C: 242516
func_Par: 4843463

As you can see, although the functions are executed in sequential order, setting cpu affinity doubles the time in the other functions. I am trying to figure out what I should to set CPU affinity (to get performance improvement in func_Par) while avoiding performance degradation in other functions.

FYI, I am compiling the code using gcc and with -O0 flag to make sure that compiler does not change any order. Moreover, I am using a quad-core processor and the OS is Linux Ubuntu.

Any help is appreciated. Thanks in advance for your help.

Pouya
  • 1,871
  • 3
  • 20
  • 25
  • How are you measuring the CPU time? – Joachim Isaksson May 30 '13 at 20:05
  • I measure cpu time use 'gettimeofday' function. – Pouya May 30 '13 at 20:05
  • You should add to this that you are using Linux. I looked at it for a while before I realized you're not using POSIX Thread affinity. – Andon M. Coleman May 30 '13 at 20:06
  • `gettimeofday` doesn't give you CPU time. It gives wall time. – Mysticial May 30 '13 at 20:06
  • Oh, you DEFINITELY do not want to do that. In Linux you can get thread runtime in user-mode and kernel-mode (2.6.xx+, I cannot remember the exact kernel release that differentiates the two modes). As with most platforms, the POSIX way of getting thread runtime doesn't work on Linux :-\ You might have to parse /proc/PID/TID/... to get the relevant information. – Andon M. Coleman May 30 '13 at 20:07
  • @Mysticial Actually, I googled it and it seems that gettimeofday is accurate enough. When I add up the cpu times, it gives me the total execution time which seems to be accurate. – Pouya May 30 '13 at 20:09
  • It may be accurate in some situations, but in properly concurrent software it gives no indication of actual performance. The kernel will put your thread to sleep and wake it up many times during a single second. Your actual time spent executing anything in those threads can only be measured if you ask the kernel... This is effectively what you are asking for by changing affinity, which affects the kernel's scheduling behaviour. – Andon M. Coleman May 30 '13 at 20:13
  • @AndonM.Coleman Thanks for your comment. I missed the last line of the code. Now, I fixed the problem. I am using 'pthread_setaffinity_np'. – Pouya May 30 '13 at 20:13
  • @Pouya No, there's a huge difference between "CPU time" and "wall time". CPU time is the time taken up by all the cores together. So if you have 8 cores running, then the CPU clock will elapse by 8 seconds each "real" second. `gettimeofday()` gives you real (wall) time. If you want CPU time, you want something like `clock()` (for linux), or POSIX `getrusage()`. – Mysticial May 30 '13 at 20:17
  • The thing is, the kernel scheduler (almost) always knows better than you. It has all the information to "ideally" schedule not only your threads, but all the other processes' threads also. Forcing thread/processor affinity forbids the scheduler to do its work properly and most of the time results in suboptimal scheduling. Funnily enough WRT other comments, "wall time" is indeed more adequate to catch that problem than mere "CPU time"... – syam May 30 '13 at 20:19
  • The odd thing with your code is that your functions that execute _before_ your CPU_SET are affected. In other words, it's taking more time before the code even comes to the point where there's a difference. That leads me to think that your way of measuring is a bit off. – Joachim Isaksson May 30 '13 at 20:31
  • @JoachimIsaksson, Thanks for your comment. Please note that all functions are in a 'while' loop, so the functions are also executed after setting CPU affinity. – Pouya May 30 '13 at 20:34
  • @Pouya That fact may change things around quite a bit. Does the main thread wait for the 3 other threads to finish executing/joining them before returning and calling func_A() again in the loop, or could func_A() be executed simultaneously with Threads_Func()? – Joachim Isaksson May 30 '13 at 20:37
  • @JoachimIsaksson All the functions are executed sequentially which means each functions finishes its job first and then the next function can start. – Pouya May 30 '13 at 20:40

0 Answers0