7

I have a code that is as follow (simplified code):

for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    { 
        Newvalues newValues;
        newValues.x1=input.x1;
        newValues.x2=input.x1*2;
        output.push_back( newValues);
    }
} 

This code works well, but if I want to make it parallel using omp parallel for, I am getting error on output.push_back and it seems that during vector resize, the memory corrupted.

What is the problem and how can I fix it?

How can I make sure only one thread inserting a new item into vector at any time?

LihO
  • 41,190
  • 11
  • 99
  • 167
mans
  • 17,104
  • 45
  • 172
  • 321
  • 3
    Are you using any synchronization mechanisms? – Luchian Grigore Oct 09 '13 at 09:35
  • @LuchianGrigore: no! The code is as shown above! – mans Oct 09 '13 at 09:36
  • Non-const functions of `std` containers are not thread-safe; you must synchronise if you want to modify them concurrently. – Angew is no longer proud of SO Oct 09 '13 at 09:40
  • You're lucky that you didn't call `output.reserve(input.rows)`. That would have prevented the memory reallocation, but the code would still be thread-unsafe. And that might manifest itself in "missing" values, which would be far harder to spot. – MSalters Oct 09 '13 at 10:06
  • Must the output be ordered like the input? Keeping that order is a bit of a challenge when the actual input isn't processed in-order, and there's no 1 to 1 match with `output[]`. Depending on the relative speeds of all functions, it may be worthwhile to determine `goodInput[]` sequentially, and then convert that 1 to 1, in parallel, to `output[]`. If `isGoodMatch` is the problem, calculate `inputMask[]` in parallel, and then do the remainder sequentially. – MSalters Oct 09 '13 at 10:09
  • @MSalters: output doesn't need to be in the same order as input. – mans Oct 09 '13 at 11:08
  • Some people, when confronted with a problem, think, "I know, I'll use threads," and then two they hav erpoblesms. – Slava Jun 16 '17 at 15:55

6 Answers6

8

The simple answer is that std::vector::push_back is not thread-safe.

In order to safely do this in parallel you need to synchronize in order to ensure that push_back isn't called from multiple threads at the same time.

Synchronization in C++11 can easily be achieved by using an std::mutex.

Agentlien
  • 4,996
  • 1
  • 16
  • 27
6

std::vector's push_back can not guarantee a correct behavior when being called in a concurrent manner like you are doing now (there is no thread-safety).

However since the elements don't depend on each other, it would be very reasonable to resize the vector and modify elements inside the loop separately:

output.resize(input.rows);
int k = 0;

#pragma omp parallel for shared(k, input)
for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    { 
        Newvalues newValues;
        ...
        // ! prevent other threads to modify k !
        output[k] = newValues;
        k++;
        // ! allow other threads to modify k again !
    }
} 

output.resize(k);

since the direct access using operator[] doesn't depend on other members of std::vector which might cause inconsistencies between the threads. However this solution might still need an explicit synchronization (i.e. using a synchronization mechanism such as mutex) that will ensure that a correct value of k will be used.

"How can I make sure only one thread inserting a new item into vector at any time?"

You don't need to. Threads will be modifying different elements (that reside in different parts of memory). You just need to make sure that the element each thread tries to modify is the correct one.

LihO
  • 41,190
  • 11
  • 99
  • 167
  • This doesn't work as the number of items in output is not the same as input. Please look at if(isGoodmatch()) – mans Oct 09 '13 at 09:46
  • 1
    @mans: See my edit. Still you can rely on `output.size()` <= `input.size()` – LihO Oct 09 '13 at 09:49
  • 3
    You still need to make `output[k] = newValues; k++;` atomic, or there is a race condition, as two threads can both write to the same `k`. – Anton Guryanov Oct 09 '13 at 09:56
  • @AntonGuryanov: Although this sounds perfectly reasonable, I think incrementing an `int` variable this way should be handled by OMP (but I must admit that I'm not sure though) – LihO Oct 09 '13 at 10:02
  • 1
    @LihO: even if `k++` is atomic, it doesn't follow that if `output[k] = newValues` is threadsafe and `k++` is atomic, then `output[k] = newValues; k++;` is atomic. Two threads could both see the same value of `k` in the fist line before either one of them increments it in the second line. – Steve Jessop Oct 09 '13 at 10:12
  • @SteveJessop: Yes, I understand Anton's point, but... Are you sure this isn't handled internally by OMP? – LihO Oct 09 '13 at 10:15
  • 1
    @LihO: I'm not *certain* that it isn't, I've never used OMP. But it seems like quite a big claim to me, that I'd consider unlikely unless someone is sure it is. Do you mean that OMP will detect all uses of `k` in the loop, and make the block of code from the first to the last into an atomic operation? For a slightly different loop (one that uses `k` early in the loop body) that could prevent OMP from doing any parallelization at all. But maybe that's the point of declaring `k` shared, I don't know... – Steve Jessop Oct 09 '13 at 10:19
  • @SteveJessop: Yes, that's what I was pointing out... that there might be more complex functionality hidden behind it when we explicitly express that `k` is shared between the threads. However I've edited my answer :) – LihO Oct 09 '13 at 10:30
  • 1
    [The correct way to capture and increment k](http://stackoverflow.com/a/7918281/15416) – MSalters Oct 09 '13 at 11:32
  • @MSalters: That is an awesome link :) – LihO Oct 09 '13 at 11:34
2

Use concurrent vector

#include <concurrent_vector.h>

Concurrency::concurrent_vector<int> in c++11.

It is thread safe version of vector.

Alexei - check Codidact
  • 22,016
  • 16
  • 145
  • 164
ahmed raza
  • 21
  • 1
  • 1
    I don't see _any_ evidence anywhere that `concurrent_vector` is "in c++11". There's an implementation of it in Intel's Threading Building Blocks as [`tbb:concurrent_vector`](https://software.intel.com/en-us/node/506203), and Microsoft's Concurrency Runtime (ConCRT) has a a [`concurrency::concurrent_vector`](https://learn.microsoft.com/en-us/cpp/parallel/concrt/reference/concurrency-namespace?view=vs-2019), but those are specific to Intel TBB and Visual C++, respectively. (OpenMP _can_ be used in MSVC, granted. But Intel recommends not mixing it with TBB.) – FeRD Jun 10 '20 at 22:32
2

Put a #pragma omp critical before the push_back.

David Doria
  • 9,873
  • 17
  • 85
  • 147
0

I solved a similar problem by deriving the standard std::vector class just to implement an atomic_push_back method, suitable to work in the OpenMP paradigm.

Here is my "OpenMP-safe" vector implementation:

template <typename T>
class omp_vector : public std::vector<T>
{
    private:
    omp_lock_t lock;
    public:
    omp_vector()
    {
         omp_init_lock(&lock);
    }
    void atomic_push_back(T const &p)
    {
        omp_set_lock(&lock);
        std::vector<T>::push_back(p);
        omp_unset_lock(&lock);
    }
};

of course you have to include omp.h. Then your code could be just as follows:

opm_vector<...> output;

#pragma omp parallel for shared(input,output)     
for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    { 
        Newvalues newValues;
        newValues.x1=input.x1;
        newValues.x2=input.x1*2;
        output.atomic_push_back( newValues);
    }
}

If you still need the output vector somewhere else in a non-parallel section of the code, you could just use the normal push_back method.

Paolo M
  • 9
  • 2
-9

You can try to use a mutex to fix the problem. Usually I prefer to achieve such thing myself;

static int mutex=1;
int signal(int &x)
{
    x+=1;
    return 0;
}
int wait(int &x)
{
    x-=1;
    while(x<0);
    return 0;
}
for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    {
        Newvalues newValues;
        newValues.x1=input.x1;
        newValues.x2=input.x1*2;
        wait(mutex);
        output.push_back( newValues);
        signal(mutex);
    }
} 

Hope this could help.

ImaChinese
  • 14
  • 1
  • 7
  • 7
    Good demonstration of why you shouldn't try to write your own mutex – Jonathan Wakely Oct 09 '13 at 10:04
  • 4
    Using a mutex to synchronize accesses to the vector is a plausible approach, but the only thing your code synchronizes it the CPU temperature with the time your program is executed. – Frerich Raabe Oct 09 '13 at 10:30