std::vector push_back fails when used in a parallel for loop

Question

I have a code that is as follow (simplified code):

for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    { 
        Newvalues newValues;
        newValues.x1=input.x1;
        newValues.x2=input.x1*2;
        output.push_back( newValues);
    }
}

This code works well, but if I want to make it parallel using omp parallel for, I am getting error on output.push_back and it seems that during vector resize, the memory corrupted.

What is the problem and how can I fix it?

How can I make sure only one thread inserting a new item into vector at any time?

Non-const functions of `std` containers are not thread-safe; you must synchronise if you want to modify them concurrently. — Angew is no longer proud of SO, Oct 09 '13 at 09:40
You're lucky that you didn't call `output.reserve(input.rows)`. That would have prevented the memory reallocation, but the code would still be thread-unsafe. And that might manifest itself in "missing" values, which would be far harder to spot. — MSalters, Oct 09 '13 at 10:06
Must the output be ordered like the input? Keeping that order is a bit of a challenge when the actual input isn't processed in-order, and there's no 1 to 1 match with `output[]`. Depending on the relative speeds of all functions, it may be worthwhile to determine `goodInput[]` sequentially, and then convert that 1 to 1, in parallel, to `output[]`. If `isGoodMatch` is the problem, calculate `inputMask[]` in parallel, and then do the remainder sequentially. — MSalters, Oct 09 '13 at 10:09
@MSalters: output doesn't need to be in the same order as input. — mans, Oct 09 '13 at 11:08
Some people, when confronted with a problem, think, "I know, I'll use threads," and then two they hav erpoblesms. — Slava, Jun 16 '17 at 15:55

score 8 · Answer 1 · answered Oct 09 '13 at 09:42

8

The simple answer is that std::vector::push_back is not thread-safe.

In order to safely do this in parallel you need to synchronize in order to ensure that push_back isn't called from multiple threads at the same time.

Synchronization in C++11 can easily be achieved by using an std::mutex.

answered Oct 09 '13 at 09:42

Agentlien

4,996
1
16
27

1

Why not just put a `#pragma omp critical` before the push_back? – David Doria Aug 16 '16 at 22:27
@DavidDoria I am not familiar with OpenMP and simply gave the standard C++ solution. – Agentlien Aug 17 '16 at 05:14
@DavidDoria your suggestion worked perfectly for me. You should make it an official answer – Vince W. Mar 30 '20 at 03:11
@VinceW. I just posted the 'critical' recommendation as an answer. – David Doria Mar 30 '20 at 10:47

LihO · Answer 2 · 2013-10-09T10:26:56.400

6

std::vector's push_back can not guarantee a correct behavior when being called in a concurrent manner like you are doing now (there is no thread-safety).

However since the elements don't depend on each other, it would be very reasonable to resize the vector and modify elements inside the loop separately:

output.resize(input.rows);
int k = 0;

#pragma omp parallel for shared(k, input)
for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    { 
        Newvalues newValues;
        ...
        // ! prevent other threads to modify k !
        output[k] = newValues;
        k++;
        // ! allow other threads to modify k again !
    }
} 

output.resize(k);

since the direct access using operator[] doesn't depend on other members of std::vector which might cause inconsistencies between the threads. However this solution might still need an explicit synchronization (i.e. using a synchronization mechanism such as mutex) that will ensure that a correct value of k will be used.

"How can I make sure only one thread inserting a new item into vector at any time?"

You don't need to. Threads will be modifying different elements (that reside in different parts of memory). You just need to make sure that the element each thread tries to modify is the correct one.

edited Oct 09 '13 at 10:26

answered Oct 09 '13 at 09:41

LihO

41,190
11
99
167

This doesn't work as the number of items in output is not the same as input. Please look at if(isGoodmatch()) – mans Oct 09 '13 at 09:46
1

@mans: See my edit. Still you can rely on `output.size()` <= `input.size()` – LihO Oct 09 '13 at 09:49
3

You still need to make `output[k] = newValues; k++;` atomic, or there is a race condition, as two threads can both write to the same `k`. – Anton Guryanov Oct 09 '13 at 09:56
@AntonGuryanov: Although this sounds perfectly reasonable, I think incrementing an `int` variable this way should be handled by OMP (but I must admit that I'm not sure though) – LihO Oct 09 '13 at 10:02
1

@LihO: even if `k++` is atomic, it doesn't follow that if `output[k] = newValues` is threadsafe and `k++` is atomic, then `output[k] = newValues; k++;` is atomic. Two threads could both see the same value of `k` in the fist line before either one of them increments it in the second line. – Steve Jessop Oct 09 '13 at 10:12
@SteveJessop: Yes, I understand Anton's point, but... Are you sure this isn't handled internally by OMP? – LihO Oct 09 '13 at 10:15
1

@LihO: I'm not *certain* that it isn't, I've never used OMP. But it seems like quite a big claim to me, that I'd consider unlikely unless someone is sure it is. Do you mean that OMP will detect all uses of `k` in the loop, and make the block of code from the first to the last into an atomic operation? For a slightly different loop (one that uses `k` early in the loop body) that could prevent OMP from doing any parallelization at all. But maybe that's the point of declaring `k` shared, I don't know... – Steve Jessop Oct 09 '13 at 10:19
@SteveJessop: Yes, that's what I was pointing out... that there might be more complex functionality hidden behind it when we explicitly express that `k` is shared between the threads. However I've edited my answer :) – LihO Oct 09 '13 at 10:30
1

[The correct way to capture and increment k](http://stackoverflow.com/a/7918281/15416) – MSalters Oct 09 '13 at 11:32
@MSalters: That is an awesome link :) – LihO Oct 09 '13 at 11:34

score 2 · Answer 3 · edited Jun 16 '17 at 15:51

2

Use concurrent vector

#include <concurrent_vector.h>

Concurrency::concurrent_vector<int> in c++11.

It is thread safe version of vector.

edited Jun 16 '17 at 15:51

Alexei - check Codidact

22,016
16
145
164

answered Jun 16 '17 at 15:29

ahmed raza

21
1

1

I don't see _any_ evidence anywhere that `concurrent_vector` is "in c++11". There's an implementation of it in Intel's Threading Building Blocks as [`tbb:concurrent_vector`](https://software.intel.com/en-us/node/506203), and Microsoft's Concurrency Runtime (ConCRT) has a a [`concurrency::concurrent_vector`](https://learn.microsoft.com/en-us/cpp/parallel/concrt/reference/concurrency-namespace?view=vs-2019), but those are specific to Intel TBB and Visual C++, respectively. (OpenMP _can_ be used in MSVC, granted. But Intel recommends not mixing it with TBB.) – FeRD Jun 10 '20 at 22:32

score 2 · Answer 4 · answered Mar 30 '20 at 10:47

2

Put a #pragma omp critical before the push_back.

answered Mar 30 '20 at 10:47

David Doria

9,873
17
85
147

score 0 · Answer 5 · answered Jan 24 '22 at 12:02

I solved a similar problem by deriving the standard std::vector class just to implement an atomic_push_back method, suitable to work in the OpenMP paradigm.

Here is my "OpenMP-safe" vector implementation:

template <typename T>
class omp_vector : public std::vector<T>
{
    private:
    omp_lock_t lock;
    public:
    omp_vector()
    {
         omp_init_lock(&lock);
    }
    void atomic_push_back(T const &p)
    {
        omp_set_lock(&lock);
        std::vector<T>::push_back(p);
        omp_unset_lock(&lock);
    }
};

of course you have to include omp.h. Then your code could be just as follows:

opm_vector<...> output;

#pragma omp parallel for shared(input,output)     
for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    { 
        Newvalues newValues;
        newValues.x1=input.x1;
        newValues.x2=input.x1*2;
        output.atomic_push_back( newValues);
    }
}

If you still need the output vector somewhere else in a non-parallel section of the code, you could just use the normal push_back method.

score -9 · Answer 6 · answered Oct 09 '13 at 09:55

-9

You can try to use a mutex to fix the problem. Usually I prefer to achieve such thing myself;

static int mutex=1;
int signal(int &x)
{
    x+=1;
    return 0;
}
int wait(int &x)
{
    x-=1;
    while(x<0);
    return 0;
}
for( int i = 0; i < input.rows; i++ )
{
    if(IsGoodMatch(input[I])
    {
        Newvalues newValues;
        newValues.x1=input.x1;
        newValues.x2=input.x1*2;
        wait(mutex);
        output.push_back( newValues);
        signal(mutex);
    }
}

Hope this could help.

answered Oct 09 '13 at 09:55

ImaChinese

14
1
7

7

Good demonstration of why you shouldn't try to write your own mutex – Jonathan Wakely Oct 09 '13 at 10:04
4

Using a mutex to synchronize accesses to the vector is a plausible approach, but the only thing your code synchronizes it the CPU temperature with the time your program is executed. – Frerich Raabe Oct 09 '13 at 10:30

std::vector push_back fails when used in a parallel for loop

6 Answers6

Linked