1

Given a data structure based on the map shown below:

std::map<int, std::set<std::vector<int>>> cliques;

Where the key indicates the size of the vectors contained on it.

At the beginning, the map has only one key (e.g. [3]) which contains the input vectors (e.g. {1, 3, 5} and {2, 4, 6}).

My function takes the vectors stored in the biggest key of the map and decompose them into all the possible combinations featuring less elements and store them in the key corresponding to the size of the new vectors (e.g. [2] = {1,3} {1,5} {3,5} {2,4} {2,6} {4,6} and [1] = {1} {3} {5} {2} {4} {6}) .

I don't know if my solution is the most efficient, but it works pretty well. But since my project is intended to handle a big amount of data, I need to parallelize my code, which led me to the following implementation:

/// Declare data structure
std::map<int, std::set<std::vector<int>>> cliques;

/// insert "input" vectors
cliques[5] = {{1, 3, 5, 7, 9}};
cliques[4] = {{1, 2, 3, 4}};

/// set boundaries
int kMax = 5;
int kMin = 1;

/// enable/disable parallel execution
bool parallelExec = true;

/// "decompose" source vectors:
for (int k = kMax; k > kMin; k--)
{
  std::set<std::vector<int>>::iterator it = cliques[k].begin();
  #pragma omp parallel num_threads(max_threads) if(parallelExec)
  {
    for(int s = 0; s < cliques[k].size(); ++s)
    {
      std::vector<int> clique;
      /// maybe should be "omp critical"?
      /// maybe "clique" should be private? (or is it already??) 
      #pragma omp single
      { 
        clique = *it;
      }
      for (int v = 0; v < clique.size(); ++v)
      {
        int& vertex = clique[v];
        std::vector<int> new_clique;
        std::copy_if(clique.begin(), clique.end(), std::back_inserter(new_clique), [vertex](const int& elem) { return elem != vertex; });
        int kNew = k - 1;
        #pragma omp critical
        { 
          cliques[kNew].insert(new_clique);
        }
      }
      #pragma omp single
      {
        it++;
      }
    }
  }
}

/// Display results
for(int i = cliques.size(); i > 0 ; i--)
{
    auto kSet = cliques[i];
    std::cout << "[" << i << "]: ";  
    for(auto& vec : kSet)
    {
        std::cout << "{";
        for(auto& elem : vec)
        {
            std::cout << elem << " ";  
        }
        std::cout << "} ";
    }
    std::cout << std::endl;
}

The use of "omp parallel" and "omp single" (rather than "omp for") allows to access the data structure safely, while allowing all other operations to run in parallel. The code works almost perfectly, almost... since it miss some (very few) sub-vectors in the final results (which are successfully generated if omp is disabled).

Is there any "OMP expert" able to help me with this problem? Thank you in advance.

---------------

UPDATE

Minimal Complete Verifiable Example

mitxael
  • 52
  • 7
  • 1
    standard library containers do not support concurrent access, particularly `std::set`. Find a library of C++ containers supporting parallel access - or create separate sets, then merge them. – einpoklum Apr 09 '18 at 22:28
  • @einpoklum Concurrent access is not supported for stl containers (except vector and queue), but I read some examples where (sequential) access can be done using "omp single" inside the "omp parallel" region rather than "omp for" while executing all other operations in parallel, that's what my attempt is all about. – mitxael Apr 10 '18 at 11:08

1 Answers1

1

I'm not sure I understand all the subtleties of your algorithm, therefore I cannot be completely sure of my analysis. That disclaimer said, here is what I believe happens:

  1. You are not parallelizing the processing: you don't distribute the work across threads, you only replicate the same work on all threads, which step on each other's toes to write the result back to the same location.
  2. Even that isn't done properly since the increment of the iterator, which is done with an omp single nowait allows for threads to work on the previous iteration since no synchronization on the value of it is performed at this stage. (NB: the omp single without nowait that protects the increment of the iterator upon exit has an implicit barrier that ensures the thread coherent view of this value, so the discrepancy can only be between the current iteration and the previous one)
  3. This cliques[kNew].insert(new_clique); is really the place where all can explode since the accesses to the same location are concurrent, which isn't supported by a standard container. (and which is wrong anyway, to the limit of my understanding)

So again, please bear in mind my initial disclaimer, but I think that your algorithm is intrinsically very wrong, for many reasons, and that it is only by chance that it gives something anywhere close to what you expect.

Finally, I was about to propose you an algorithm of mine but since there are so many pieces lacking from your code snippet, I just can't. If you post a proper mcve, then maybe I will.

Update Based on your code, here is a possible parallel version:

for (int k = kMax; k > kMin; k--)
{
    std::set<std::vector<int>>::iterator it = cliques[k].begin();
    for(int s = 0; s < cliques[k].size(); ++s)
    {
        std::vector<int> clique = *it;
        #pragma omp parallel for num_threads(max_threads)
        for (int v = 0; v < clique.size(); ++v)
        {
            int& vertex = clique[v];
            std::vector<int> new_clique;
            std::copy_if(clique.begin(), clique.end(), std::back_inserter(new_clique), [vertex](const int& elem) { return elem != vertex; });
            int kNew = k - 1;
            #pragma omp critical
            cliques[kNew].insert(new_clique);
        }
        it++;
    }
}
Gilles
  • 9,269
  • 4
  • 34
  • 53
  • Hi @Gilles , thanks for your time. The problem isn't the algorithm itself, I've tested and works efficiently for my case, I just want to try to run it in a parallel fashion. Regarding OMP, for my understanding (excuse me if I'm wrong but I'm an OMP newbie), after creating the parallel region with "omp parallel" all operations are parallel except the ones enclosed as "omp single". You can find a MCVE at following link (please check it and tell me how it should be parallelized): http://rextester.com/GGY82679 – mitxael Apr 10 '18 at 11:17
  • Well, 1/ you should update your post with your code rather than liking to an external link; and 2/ you algorithm looks like it carries over a dependence between iterations, so parallelizing it won't be possible in its current format... – Gilles Apr 10 '18 at 11:39
  • In fact I don't want to parallelize iterations over the map's keys (lower keys are dependent on greater keys), I want to do it inside the set contained in each key: each thread should work on each (independent) vector contained in the set and generate the subvectors by removing one element at time from the vector. Do you think it's feasible? – mitxael Apr 10 '18 at 11:50
  • nice simplification, and works pretty well, no subvector is lost on the way. If making loops work with OMP is a matter of experience, I take this as a good one on how to arrange variables. Thank you! – mitxael Apr 10 '18 at 13:34