1

I am having trouble parallelizing A* algotithm. I have tried parallelizing individual for loops but that didn't improve anything. In fact serial implementation is still faster than this one. Can you help me improve this or give me some ideas?

while(openSet.size() > 0){
    PAIR current = {};
    int maxVal = INT32_MAX;

    int i;
    #pragma omp parallel for num_threads(tc) ordered schedule(dynamic, 1) private(i) shared(openSet, maxVal, current, fScores)
    for(i = 0;i < openSet.size();i++){
        if(fScores[openSet[i].x * dim + openSet[i].y] < maxVal){
            #pragma omp ordered
            maxVal = fScores[openSet[i].x * dim + openSet[i].y];
            current = openSet[i];
        }
    }

    if(current.x == xEnd && current.y == yEnd){
        elapsed = omp_get_wtime() - start;
        //printMat(gScores, dim);
        printPath("res.txt", mat, cameFrom, current, dim, tc);
        break;
    }

    int rm = check_remove(openSet, current, tc);
    openSet.erase(openSet.begin() + rm);

    vector<PAIR> neighbours;

    if(current.x - 1 >= 0 && mat[(current.x - 1) * dim + current.y] != '1'){
        neighbours.push_back(PAIR(current.x - 1, current.y));
    }
    if (current.y - 1 >= 0 && mat[current.x * dim + (current.y - 1)] != '1'){
        neighbours.push_back(PAIR(current.x, current.y - 1));
    }
    if (current.x + 1 < dim && mat[(current.x + 1) * dim + current.y] != '1'){
        neighbours.push_back(PAIR(current.x + 1, current.y));
    }
    if (current.y + 1 < dim && mat[current.x * dim + (current.y + 1)] != '1'){
        neighbours.push_back(PAIR(current.x, current.y + 1));
    }
    
    int tentative_gScore;
    #pragma omp parallel for num_threads(tc) ordered schedule(dynamic, 1) private(i) shared(neighbours, openSet, gScores, fScores, tentative_gScore)
    for(i = 0;i < neighbours.size();i++){
        tentative_gScore = gScores[current.x * dim + current.y] + 1;

        if(tentative_gScore < gScores[neighbours[i].x * dim + neighbours[i].y]){
            #pragma omp ordered
            cameFrom[neighbours[i].x * dim + neighbours[i].y] = current;
            gScores[neighbours[i].x * dim + neighbours[i].y] = tentative_gScore;
            fScores[neighbours[i].x * dim + neighbours[i].y] = tentative_gScore + hScore(); //(p.x, p.y, xEnd, yEnd)
            if(contains(openSet, neighbours[i]) == false){
                openSet.push_back(neighbours[i]);
            }
        }
    }
}
  • A* is not a good candidate for parallelization. But you could use better data structures to make it faster. – harold Jan 14 '21 at 21:19
  • I used vectors. What do you suggest i use? – Igor Karadzic Jan 14 '21 at 21:27
  • A* is a best-first search, meaning nodes are intended to be processed in a certain order (best-first). This makes it essentially impossible to parallelize. – Alexander Guyer Jan 14 '21 at 21:29
  • @IgorKaradzic [a combination of a heap and hashmap](https://stackoverflow.com/q/41297236/555045) for the open set. I don't see any closed set in your code, but also something indexable for that, not something that requires a full search – harold Jan 14 '21 at 21:39
  • Will try that. Thank you. – Igor Karadzic Jan 14 '21 at 22:39

1 Answers1

1

In the first loop, the ordered clause induces a huge waste of time that you can avoid by using reduction if you need only maxVal. However, since you need also current you have to do the reduction manually. So, instead of looking for maxVal and current directly, you should create intermediate vectors for these variables , namely maxValVector and currentVector. Then, I suggests that each thread looks for maxValVector[omp_get_thread_num()] and currentVector[omp_get_thread_num()]. Thus, you don't need to use ordered clause and each thread knows the maximum encountred value. Then, you look, using a serial mini loop on the number of threads involved, to get maxVal and current from maxValVector and currentVector. This should allow you to have a more efficient parallelization for the first loop.

In the second loop, the variable tentative_gScore should be private in order to garantee thread safe. Also I don't see the need of ordered clause here too. Not sure what is going with openSet.push_back(neighbours[i]);, but probably this line should be protected by atomic, if possible, or critical clause.

Noureddine
  • 180
  • 1
  • 10