OpenMP - std::next_permutation

Question

I am trying to parallelize my own C++ implementation of Travelling Salesman Problem using OpenMP.

I have a function to calculate cost of road cost() and vector [0,1,2,...,N], where N is a number of nodes of the road.

In main(), I am trying to find the best road:

do
{
    cost();
} while (std::next_permutation(permutation_base, permutation_base + operations_number));

I was trying to use #pragma omp parallel to parallelize that code, but it only made it more time consuming. Is there any way to parallelize that code?

score 5 · Accepted Answer · edited Aug 19 '18 at 16:36

#pragma omp parallel doesn't automatically divide the computation on separate threads. If you want to divide the computation you need do additionally use #pragma omp for, otherwise the hole computation is done multiple times, one time for each thread. For instance the following code prints "Hello World!" four times on my laptop, since it has 4 cores.

int main(int argc, char* argv[]){
    #pragma omp parallel
    cout << "Hello World!\n";
}

The same thing happens to your code, if you simple write #pragma omp parallel. Your code gets executed multiple times, once for each thread. And therefore your program won't be faster. If you want to divide the work onto the threads (each thread does different things), you have to use something like #pragma omp parallel for.

Now we can look at your code. It isn't suited for parallelization. Lets see why. You start with your array permutation_base and calculate the costs. Then you manipulate permutation_base with next_permutation. You actually have to wait for the finished cost computations, before you are allowed to manipulate the the array, because otherwise the cost computation would be wrong. So the whole thing wouldn't work on separate threads.

One possible solution would be, to keep multiple copies of your array permutation_base, and each possible permutation base only runs through a part of all permutations. For instance:

vector<int> permutation_base{1, 2, 3, 4};
int n = permutation_base.size();
#pragma omp parallel for
for (int i = 0; i < n; ++i) {
    // Make a copy of permutation_base
    auto perm = permutation_base;
    // rotate the i'th  element to the front
    // keep the other elements sorted
    std::rotate(perm.begin(), perm.begin() + i, perm.begin() + i + 1);
    // Now go through all permutations of the last `n-1` elements. 
    // Keep the first element fixed. 
    do {
        cost() 
    }
    while (std::next_permutation(perm.begin() + 1, perm.end()));
}

Thank you very much! It is working perfectly now, I've learnt more about OpenMP from your and @erip answer than during 1,5h long lecture at my Uni. — Siemko, Nov 08 '15 at 17:27

score 1 · Answer 2 · answered Nov 08 '15 at 15:36

Most definitely.

The big problem with parallelizing these permutation problems is that in order to parallelize well, you need to "index" into an arbitrary permutation. In short, you need to find the kth permutation. You can take advantage of some cool math properties and you'll find this:

std::vector<int> kth_perm(long long k, std::vector<int> V) {
    long long int index;
    long long int next;
    std::vector<int> new_v;
    while(V.size()) {
        index = k / fact(V.size() - 1);
        new_v.push_back(V.at(index));
        next = k % fact(V.size() - 1);
        V.erase(V.begin() + index);
        k = next;
    }
    return new_v;
}

So then your logic might look something like this:

long long int start = (numperms*threadnum)/ numthreads;
long long int end = threadnum == numthreads-1 ? numperms : (numperms*(threadnum+1))/numthreads;

perm = kth_perm(start, perm); // perm is your list of permutations

for (int j = start; j < end; ++j){
    if (is_valid_tour(adj_list, perm, startingVertex, endingVertex)) {
        isValidTour=true;
        return perm;
    }
    std::next_permutation(perm.begin(),perm.end());
}

isValidTour = false;
return perm;

Obviously there's a lot of code, but the idea of parallelizing it can be captured by the little code I've posted. You can visualize "indexing" like this:

|--------------------------------|
^        ^                   ^
t1      t2        ...        tn

Find the ith permutation and let a thread call std::next_permutation until it finds the starting point of the next thread.

Note that you'll want to wrap the function that contains the bottom code in #pragma omp parallel

Thank you for your help, could you tell me one more thing, how can I know which vertex should be `starting` and `ending`? — Siemko, Nov 08 '15 at 17:31
@Siemko My code was actually originally engineered for the [Hamiltonian path](https://en.wikipedia.org/wiki/Hamiltonian_path) problem. There will be some minor changes in the algorithm, but the big takeaway is the kth permutation. That's the key to efficient parallelization. — erip, Nov 08 '15 at 17:33

OpenMP - std::next_permutation

2 Answers2