Strange concurrency issue with STL / OpenMP in 64 bit builds

Question

I have an odd issue when I build one of our projects in a 64 bit debug config. It appears to produce some strange behaviour which looks a little like a single iterator is getting incremented multiple times. I've narrowed it down to the following test code:

#include <omp.h>

#define _HAS_ITERATOR_DEBUGGING 0

#include <vector>
#include <set>

int main(int argc, const char* argv[]) {    
   std::vector<int> v;
   for(int j = 0; j < 20; ++j) {
      v.push_back(j);
   }

   #pragma omp parallel for
   for(int i = 0; i < 100000; ++i) {
      std::set<int> s;
      std::vector<int>::const_iterator begin = v.begin(), end = v.end();
      s.insert(begin, end); // after this line s should contain the numbers 0-19
      if(s.size() != 20) {
         printf("fail\n");
         exit(3);
      }
   }
   return 0;
}

However, the size test frequently fails which implies that somehow it is not inserting the entire contents of the vector - and a lot of poking at it makes it look rather like the vector iterators are being incremented more than one step at a time. It's quite hard to tell though since it doesn't tend to happen if one breaks in with the debugger.

The obvious conclusion to draw would be that it's not threadsafe, but my understanding is that it should be because the only variable modified is s which has local scope.

There are quite a few things which will fix the immediate problem:

remove the parallel for
throw a critical section around the insert() call
#define HAS_ITERATOR_DEBUGGING 1
replace the single insert() call with a manual loop and insert each item individually (this is basically what that function does internally, but the problem definitely goes away when I do it myself)
build a 32bit version of the same code
build the release version of the same code

This is being compiled under MSVC++ 2008 SP1, with the compiler-supplied STL implementation.

Can anyone shed any light on what's going on here? Thanks in advance for any hints - I'm pretty stumped :)

Edit: In case it is not clear, I am not looking for a "quick fix" to make this code work; as noted above I know of quite a few of those. What I want to understand is why this problem is occurring in the first place.

Edit 2: Code works correctly when compiled with gcc.

Might this be due to the fact that your `s`, `begin` and `end` variables are shared across all threads? Try adding `private(s), private(begin), private(end)` after `omp parallel for`. — AVH, Apr 03 '11 at 21:02
Darhuuk: adding those private clauses doesn't compile ("undeclared identifier"). my understanding is that any variable declared inside the loop is implicitly private anyway? — Peter, Apr 03 '11 at 21:03
fazo: only happens without optimization (/Od). At /O1 the problem goes away. — Peter, Apr 03 '11 at 21:07
For the #pragma omp parallel for, do I have to have anything on my system for that to actually work? I tried compiling your solution for x64 in VS10 in Debug, ran it, and had no problems; so I'm not sure. — leetNightshade, Apr 11 '11 at 06:26
leetNightshade: you have to turn on OpenMP in the project properties for the pragma to mean anything (Configuration Properties > C/C++ > Language > OpenMP Support, or append /openmp to the command line) — Peter, Apr 11 '11 at 21:03
Just a wild guess but it may be that OpenMP detects that `s.insert(begin, end);` is a loop as well, tries to parallelize it but fails since it uses the `!=` syntax which AFAIK is not supported as a termination condition for a `for` loop... — Eugen Constantin Dinca, Apr 12 '11 at 19:14
@Eugen: OpenMP does not do or attempt to do automatic parallelisation; there must be a pragma that marks the loop as parallel. Compilers can parallelise some loops automatically - e.g. Intel's compiler has an option for that; but I am not aware of a similar option in MSVC compiler. — Alexey Kukanov, Apr 13 '11 at 07:45
I strongly suspect a bug in MS2008 STL implementation's thread safety. Try this without openmp and simple threading? — sehe, Apr 16 '11 at 20:14
sehe: My thoughts are similar, but I'm treading a little carefully before blaming them. Thanks for the idea - I will try it when back at work tomorrow. — Peter, Apr 16 '11 at 22:10
@Peter: good attitude. FWIW I tested this with `g++-{4.5|4.6} {-D_GLIBCXX_PARALLEL} {-g} -fopenmp -lgomp` (`{}` denoting optionals) running under both valgrind --tool=memcheck and valgrind --tool=helgrind with no abnormal results (except that running under valgrind memcheck seemed to prevent multiprocessing, effectively locking the work to 1 core, but that's probably the locking in memcheck's malloc/free wrappers) — sehe, Apr 16 '11 at 22:51
sehe: have tried using Windows threading which fixes it, so would appear there's something wrong with their OpenMP implementation. Appears to work if I move the interior of the loop into a separate function - maybe the OpenMP code generated scopes some variables incorrectly? — Peter, Apr 18 '11 at 01:12

Alexey Kukanov · Answer 1 · 2011-04-04T11:08:21.003

2

Looks like a bug in VS 2008 implementation of OpenMP and/or STL. With VS 2010, it is not reproduced; moreover it is not reproduced with Intel's compiler and VS 2008 headers, which makes me think that the bug is more likely in OpenMP support in VC++ compiler.

EDIT: The code with firstprivate I posted before does not work indeed, even with _SECURE_SCL=1. Seems I just had a single lucky run.

edited Apr 04 '11 at 11:08

answered Apr 03 '11 at 22:53

Alexey Kukanov

12,479
2
36
55

Doesn't seem to fix it for me...? _SECURE_SCL hasn't seemed to have much impact here except that it does sometimes pick up that things are wrong just before hilarity ensues. – Peter Apr 04 '11 at 00:39

score 0 · Answer 2 · answered Apr 16 '11 at 17:18

0

I agree with darhuk, I would suggest to mark s private

#pragma omp parallel for private(s)

answered Apr 16 '11 at 17:18

towi

21,587
28
106
187

As I noted in the comments, this does not work; s is defined inside the loop so the above code does not even compile. There should be one copy of `s` per thread anyway. – Peter Apr 16 '11 at 21:56
Variables declared inside the scope of the loop are automatically private. Just like the loop counter i in this case. – Trass3r May 28 '13 at 13:18

John Gordon · Answer 3 · 2011-04-16T19:53:07.930

0

I've never written any OpenMP code, but I have done a lot of multi-threaded programming, so my instincts may or may not be right.

The insert() method on the set is not thread safe, and it's failing because multiple threads are invoking it concurrently. You must lock (or some OpenMP equivalent) to guarantee the inserts are performed sequentially.

EDIT: Try this

#include <omp.h>

#define _HAS_ITERATOR_DEBUGGING 0

#include <vector>
#include <set>


int main(int argc, const char* argv[]) {    
   std::vector<int> v;
   for(int j = 0; j < 20; ++j) {
      v.push_back(j);
   }


omp_lock_t writelock;
OMP_INIT_LOCK(&writelock);

   #pragma omp parallel for
   for(int i = 0; i < 100000; ++i) {
      std::set<int> s;
      std::vector<int>::const_iterator begin = v.begin(), end = v.end();
      OMP_SET_LOCK(&writelock);
      s.insert(begin, end); // after this line s should contain the numbers 0-19
      OMP_UNSET_LOCK(&writelock);
      if(s.size() != 20) {
         printf("fail\n");
         exit(3);
      }
   }
   return 0;
}

edited Apr 16 '11 at 19:53

answered Apr 16 '11 at 19:21

John Gordon

2,576
3
24
29

It is not the same set - there should be one set per thread, because it is created inside the parallel region. So as long as the insert() method is reentrant, that should be enough. – Peter Apr 16 '11 at 21:55
I have tried an equivalent to this using `omp critical` directives which does solve it, but doesn't explain what is actually going wrong. I don't think I should have to add a lock to the production version of this since it works fine without in 32-bit builds. – Peter Apr 16 '11 at 22:04
Oh, of course. I should have realized it couldn't be as straightforward as that. – John Gordon Apr 16 '11 at 23:26

Strange concurrency issue with STL / OpenMP in 64 bit builds

3 Answers3