PThreads & MultiCore CPU on Linux

Question

I am writing a simple application that uses Threads to increase the performance. The problem is, that this application runs fine on windows, using the 2 cores that my CPU has. But When I execute on Linux, It seems that only uses 1 Core.

I can't understand why this happens.

These is my code, C++:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>

void* function(void*)
{
    int i=0;
    for(i=0; i<1110111; i++)
        rand();
    return 0;
}

void withOutThreads(void)
{
    function(0);
    function(0);
}

void withThreads(void)
{
    pthread_t* h1 = new pthread_t;
    pthread_t* h2 = new pthread_t;
    pthread_attr_t* atr = new pthread_attr_t;

    pthread_attr_init(atr);
    pthread_attr_setscope(atr,PTHREAD_SCOPE_SYSTEM);

    pthread_create(h1,atr,function,0);
    pthread_create(h2,atr,function,0);

    pthread_join(*h1,0);
    pthread_join(*h2,0);
    pthread_attr_destroy(atr);
    delete h1;
    delete h2;
    delete atr;
}

int main(void)
{
    int ini,tim;
    ini = clock();
    withOutThreads();
    tim = (int) ( 1000*(clock()-ini)/CLOCKS_PER_SEC );
    printf("Time Sequential: %d ms\n",tim);
    fflush(stdout);

    ini = clock();
    withThreads();
    tim = (int) ( 1000*(clock()-ini)/CLOCKS_PER_SEC );
    printf("Time Concurrent: %d ms\n",tim);
    fflush(stdout);
    return 0;
}

Output on Linux:

Time Sequential: 50 ms
Time Concurrent: 1610 ms

Output on Windows:

Time Sequential: 50 ms
Time Concurrent: 30 ms

This question's title does not describe the question. In addition "uses Threads to increase performance" potentially signifies a red herring. — Lightness Races in Orbit, Aug 01 '11 at 18:19

score 20 · Accepted Answer · answered Feb 21 '11 at 17:02

clock() works different on windows vs linux, so don't use that to measure time. On linux it measures CPU time, on windows it measures wall clock time. Ideally these would be the same in this test case, but you should use something consistant between the platforms to measure the time. e.g. gettimeofday()

rand() serializes your threads on linux. rand() holds an internal lock as to be thread safe. The rand() manpage states rand() is not threadsafe nor reentrant, however at least the code in recent glibc aquires a lock around the call. I'm not sure how windows handles this, either it's not thread safe at all, or it uses thread local variables.

Use rand_r on linux, or find some better CPU utilization function to measure.

void* function(void*)
{
    unsigned int seed = 42;
    int i=0;
    for(i=0; i<1110111; i++)
        rand_r(&seed);
    return 0;
}

And when it measures CPU time on Linux it measures it across *all* the threads in the program. — Zan Lynx, Feb 21 '11 at 18:51

score 9 · Answer 2 · answered Feb 21 '11 at 17:05

The problem is that Linux multi-threaded version or rand() locks a mutex. Change your function to:

void* function(void*)
{
    int i=0;
    unsigned rand_state = 0;
    for(i=0; i<1110111; i++)
        rand_r(&rand_state);
    return 0;
}

Output:

Time Sequential: 10 ms
Time Concurrent: 10 ms

Sergio Troiano · Answer 3 · 2015-02-01T22:14:02.860

Linux "sees" threads like processes, it means all the processes are threads of one thread.

in the process table (task_struct) when we create a process it is created the PID, when we create a second thread then the PID becomes the TGID (thread group id) and every thread gets a TID (thread ID).

In userland we will see only the first thread (using ps aux) but if we execute "ps -eLf" we will see a new column named LWP (light weight process) which is the TID.

then for example: $ ps -eLf
UID PID PPID LWP C NLWP STIME TTY TIME CMD

root 1356 1 1356 0 4 2014 ? 00:00:00 /sbin/rsyslogd
root 1356 1 1357 0 4 2014 ? 00:02:01 /sbin/rsyslogd
root 1356 1 1359 0 4 2014 ? 00:01:55 /sbin/rsyslogd
root 1356 1 1360 0 4 2014 ? 00:00:00 /sbin/rsyslogd
dbus 1377 1 1377 0 1 2014 ? 00:00:00 dbus-daemon

As we can see the PID is the same, but the real PID is the LWP (TID). When the process has only one thread (such as dbus daemon) the PID = LWP (TID)

Internally the kernel always uses the TID like the PID.

After that the kernel will be able to use the schedule every thread using real parallelism.

score -2 · Answer 4 · edited Aug 01 '11 at 18:20

-2

That sounds like an OS scheduler implementation to me. Not per se a problem in your code. The OS decides which thread will run on what core and if the rules of thread/CPU affinity are adhered to, it will stick that thread on the same CPU each time.

That is a simple explanation for a fairly complex subject.

edited Aug 01 '11 at 18:20

Lightness Races in Orbit

378,754
76
643
1,055

answered Feb 21 '11 at 16:54

Tony The Lion

61,704
67
242
415

How is it incorrect? There is no "problem" here. The OS is free to run threads wherever it likes. – Lightness Races in Orbit Aug 01 '11 at 18:20
I think the point is, whilst the OS is free to run all the threads on just one core and nuke the potential performance, the use of rand() appears to be some thing that *will* cause problems. – thecoshman Nov 16 '12 at 14:08

PThreads & MultiCore CPU on Linux

4 Answers4