CPU instruction reordering

Question

Our processors are allowed to reorder the instruction in order to gain some performance benefits, but this may cause some strange behaviour. I'm trying to reproduce one of this issues on the base of this article.

This is my code:

int a,b;
int r1,r2;
mutex m1,m2;

void th1()
{
    for(;;)
    {
        m1.lock();
        a=1;
        asm volatile("" ::: "memory"); 
        r1=b;
        m1.unlock();
    }
}

void th2()
{
    for(;;)
    {
        m2.lock();
        b=1;
        asm volatile("" ::: "memory"); 
        r2=a;
        m2.unlock();
    }
}

int main()
{
    int cnt{0};
    thread thread1{th1};
    thread thread2{th2};
    thread1.detach();
    thread2.detach();
    for(int i=0;i<10000;i++)
    {
        m1.lock();
        m2.lock();
        a=b=0;
        m1.unlock();
        m2.unlock();
        if(r1==0&&r2==0)
        {
            ++cnt;
        }
    }
    cout<<cnt<<" CPU reorders happened!\n";
}

I use the mutexes to be sure that the 'main' thread will not modify a nor b when th1 or th2 are doing their jobs, the output of the execution change all the time, it may be 0, it may be 10000 or a casual number between 0 and 10000.

There's something about this code that makes me little unconfortable, i'm not sure whether it really reproduce the CPU reordering phenomen.

From the code looks like the only way r1 and r2 may be 0 in the 'if' is because of th1 and th2 set them to the value from 'a' and 'b', which in the context of th1 and th2 cannot be 0 due to the lock mechanism, the only way those variables are 0 is because of the instruction reordering, this this correct?

Thanks

Note that both the compiler is allowed to perform instruction reordering when compiling the program, and the CPU pipeline scheduler reorder the instructions (and/or micro instructions) at runtime. — Manu343726, Apr 06 '14 at 16:16
The compiler reordering was prevented using the memory fence, also disassembling the compiled code the sequence of instruction in th1 and th2 is the one I expect. So it should be that if any reordering happen is due to the processor, am I wrong? — fjanisze, Apr 06 '14 at 16:21
You don't initialize a and b before starting the threads. But use them to assign to r1 and r2 as if they where. — Ebbe M. Pedersen, Apr 06 '14 at 16:22
@ Ebbe M. Pedersen - a & b are global variables which would be value initialized. — MS Srikkanth, Apr 06 '14 at 16:28
@fjanise - I still don't get why r1 & r2 can't be zero. It seems perfectly reasonable to me to expect anything between 0 and 10000...why are u saying this is wrong? and what do you expect as an answer? — MS Srikkanth, Apr 06 '14 at 16:30
Assume this is the order in which the threads run continuously...first thread 1 runs...assigns 1 to b and before it can assign a to r1, thread 2 takes over. thread 2 assigns 1 to a and assigns b to r2 and unlocks m2...then thread 1 runs which assigns a to r1 and unlocks m1...now main thread runs... — MS Srikkanth, Apr 06 '14 at 16:31
I suppose just one mutex should be used for all. When i run locally the code with one mutex only the output has an averange value of 20, which is about 1 reorder each 5000 cycle, look compatible with the finding of the guy in the article. Anyway my point is — fjanisze, Apr 06 '14 at 16:46
It sounds like you're confusing memory operation re-ordering with instruction re-ordering. These are completely different concepts. Memory re-ordering comes from prefetching and posted writes and can occur even on processors that don't reorder instructions at all. — David Schwartz, Apr 06 '14 at 16:47

Solomon Slow · Accepted Answer · 2014-04-06T19:21:37.933

Your program is very different from the one in the article that you cited from preshing.com. The preshing.com program uses semaphores where yours uses mutexes.

Mutexes are simpler than semaphores. They only make one guarantee--that only one thread at a time can lock the mutex. That is to say, they can only be used for mutual exclusion.

The preshing.com program does something with its semaphores that you can't do with mutexes alone: It synchronizes the loops in the three threads so that they all proceed in lock-step. Thread1 and Thread2 each wait at the top of their loop until main() lets them go, and then main waits at the bottom of its loop until they have completed their work. Then they all go 'round again.

You can't do that with mutexes. In your program, what prevents main from going around its loop thousands of times before either of the other two threads gets to run at all? Nothing but chance. Nor does anything prevent Thread1 and/or Thread2 from looping thousands of times while main() is blocked, waiting for its next time slice.

Remember, a semaphore is a counter. Look carefully at how the semaphores in the preshing.com are incremented and decremented by the threads, and you will see how it keeps the threads synchronized.

I think this is the point, I had already the idea that it must be wrong using the mutexes but was not sure. Thanks all for the answers. — fjanisze, Apr 07 '14 at 05:21

score 0 · Answer 2 · answered Apr 07 '14 at 05:51

I made the mistake of using the mutexes instead of the semaphores (thanks james large), this is the properly working code:

#include <mutex>
#include <condition_variable>
using namespace std;

class semaphore{
private:
    mutex mtx;
    condition_variable cv;
    int cnt;

public:
    semaphore(int count = 0):cnt(count){}
    void notify()
    {
        unique_lock<mutex> lck(mtx);
        ++cnt;
        cv.notify_one();
    }
    void wait()
    {
        unique_lock<mutex> lck(mtx);

        while(cnt == 0){
            cv.wait(lck);
        }
        --cnt;
    }
};

int a,b;
int r1,r2;
semaphore s1,s2,s3;

void th1()
{
    for(;;)
    {
        s1.wait();
        a=1;
        asm volatile("" ::: "memory");
        r1=b;
        s3.notify();
    }
}

void th2()
{
    for(;;)
    {
        s2.wait();
        b=1;
        asm volatile("" ::: "memory");
        r2=a;
        s3.notify();
    }
}

int main()
{
    int cnt{0};
    thread thread1{th1};
    thread thread2{th2};
    thread1.detach();
    thread2.detach();
    for(int i=0;i<100000;i++)
    {
        a=b=0;
        s1.notify();
        s2.notify();
        s3.wait();
        s3.wait();

        if(r1==0&&r2==0)
        {
            ++cnt;
        }
    }
    cout<<cnt<<" CPU reorders happened!\n";
}

The reordering seems to be properly reproduced.

CPU instruction reordering

2 Answers2