0

My function is sorting a point cloud of 17000 points (approximately, it can fluctuate) to extract the relevant points in this one and store them in a vector. Everything works fine, but it's very slow. So I am trying to use openMp to parallelize the task but I'm getting a crash.

Here is the multi-threaded version which doesn't work:

  void IntervalMapEstimator::extract_relevant_points_multithread(std::vector<Point3D>& relevant_points, std::vector<Point3D>& pointcloud, doubleIE cell_min_angle_sensor_rot, doubleIE cell_max_angle_sensor_rot)
{

#pragma omp parallel for shared( pointcloud, cell_min_angle_sensor_rot, cell_max_angle_sensor_rot)
        for(int i = 0; i < pointcloud.size(); i++) {
            //int numThread = omp_get_thread_num();
            //std::cout << "numThread = " << numThread << std::endl;

            // Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
            if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0 && abs(cell_min_angle_sensor_rot) > M_PI/2 && abs(cell_max_angle_sensor_rot) > M_PI/2) {
                // Point must be smaller than the minimum angle and bigger than the max angle (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
                if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot || pointcloud[i].pol_sensor_rot.phi  >= cell_max_angle_sensor_rot ) {
                    relevant_points.push_back(pointcloud[i]);
                }

            } else {
                 if (pointcloud[i].pol_sensor_rot.phi  >= cell_min_angle_sensor_rot && pointcloud[i].pol_sensor_rot.phi  <= cell_max_angle_sensor_rot ) {
                     relevant_points.push_back(pointcloud[i]);
                 }
            }
        }

}

And here the response i get on the output :

7fcc93737000-7fcc9374b000 r-xp 00012000 103:05 7078283                   /lib/x86_64-linux-gnu/ld-2.23.so
7fcc938a3000-7fcc938f7000 rw-p 00000000 00:00 0 
7fcc9391e000-7fcc9392c000 rw-p 00000000 00:00 0 
7fcc93937000-7fcc93939000 rw-p 00000000 00:00 0 
7fcc93947000-7fcc9394a000 rw-p 00000000 00:00 0 
7fcc9394a000-7fcc9394b000 r--p 00025000 103:05 7078283                   /lib/x86_64-linux-gnu/ld-2.23.so
7fcc9394b000-7fcc9394c000 rw-p 00026000 103:05 7078283                   /lib/x86_64-linux-gnu/ld-2.23.so
7fcc9394c000-7fcc9394d000 rw-p 00000000 00:00 0 
7fff20b58000-7fff20b7a000 rw-p 00000000 00:00 0                          [stack]
7fff20bb8000-7fff20bbb000 r--p 00000000 00:00 0                          [vvar]
7fff20bbb000-7fff20bbd000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[interval_map-1] process has died [pid 12700, exit code -6, cmd /home/catkin_ws/SWC_INTERVAL_MAP/devel/lib/interval_map_test/interval_map_test __name:=interval_map __log:=/home/.ros/log/615acdf0-3714-11e8-bc07-9cebe84f847e/interval_map-1.log].
log file: /home/.ros/log/615acdf0-3714-11e8-bc07-9cebe84f847e/interval_map-1*.log
all processes on machine have died, roslaunch will exit
shutting down processing monitor...
... shutting down processing monitor complete
done

For now, my best guess is that there is a waiting list for looking at the value inside the vector of the point cloud or the push_back on the relevant_points vector, whose getting bigger and bigger and eventually exploding the stack. Does someone have any idea on the problem?

tony497
  • 360
  • 1
  • 3
  • 15
  • Does every thread get it's own copy of relevant_points? If not it could be a problem if multiple threads push back into the same vector. – haggi krey Apr 03 '18 at 12:44
  • I just edited my question, sorry i putted the single thread version. As you can see now, the relevant_points vector is not shared between the thread, so i'm guessing openMp do like a waiting list of some sort to do the pushback, i'm still searching on that. pointcloud vector on the other hand is shared between all the threads, but since i'm just read the value in it i'm guessing it's ok. – tony497 Apr 03 '18 at 12:50
  • variables are shared by default, you need to explicitly make `relevant_points` private – Mike van Dyke Apr 03 '18 at 12:53
  • 3
    omp does not do any race condition check as much as I know and the relevant_points are declared outside the loop and therefore it is shared by default. So you push back into the same vector in multiple threads what could be a problem. – haggi krey Apr 03 '18 at 12:58

3 Answers3

0

So, the problem is private clause of openMp doesn't work for reference vector. This doesn't work(an undefined reference type error is called):

#pragma omp parallel for private (relevant_points) 

My work around it was to use lock_guard from boost.

       void IntervalMapEstimator::extract_relevant_points_multithread(std::vector<Point3D>&  relevant_points ,std::vector<Point3D>& pointcloud, doubleIE cell_min_angle_sensor_rot, doubleIE cell_max_angle_sensor_rot)
    {

        boost::mutex mutex_relevant_points;
        omp_set_num_threads (5);

#pragma omp parallel for shared (relevant_points, pointcloud, cell_min_angle_sensor_rot, cell_max_angle_sensor_rot)
        for(int i = 0; i < pointcloud.size(); i++) {
            //int numThread = omp_get_thread_num();
            //std::cout << "numThread = " << numThread << std::endl;

            // Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
            if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0 && abs(cell_min_angle_sensor_rot) > M_PI/2 && abs(cell_max_angle_sensor_rot) > M_PI/2) {
                // Point must be smaller than the minimum angle and bigger than the max angle (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
                if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot || pointcloud[i].pol_sensor_rot.phi  >= cell_max_angle_sensor_rot ) {                   
                    boost::lock_guard<boost::mutex> lock(mutex_relevant_points);
                    relevant_points.push_back(pointcloud[i]);
                }
            } else {
                 if (pointcloud[i].pol_sensor_rot.phi  >= cell_min_angle_sensor_rot && pointcloud[i].pol_sensor_rot.phi  <= cell_max_angle_sensor_rot ) {
                    boost::lock_guard<boost::mutex> lock(mutex_relevant_points);
                    relevant_points.push_back(pointcloud[i]);
                 }
            }
        }
}

Also you can see that i set the number of threads, otherwise the program processing the points was actually taking all the threads, and the program sending all the points didn't had any thread to work with. I hope it will help someone.

tony497
  • 360
  • 1
  • 3
  • 15
0

Actually, what you are trying to achieve here is very similar to a reduction with a vector as reduction variable.

Here is how I would tackle it if I wanted to parallelize it:

void IntervalMapEstimator::extract_relevant_points_multithread( std::vector<Point3D>& relevant_points,
                                                                std::vector<Point3D>& pointcloud,
                                                                doubleIE cell_min_angle_sensor_rot,
                                                                doubleIE cell_max_angle_sensor_rot ) {
    #pragma omp parallel num_threads( 5 )
    {
        std::vector<Point3D> local_relevant_points;
        #pragma omp for
        for ( int i = 0; i < pointcloud.size(); i++ ) {
            // Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
            if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0
               && abs( cell_min_angle_sensor_rot ) > M_PI/2 && abs( cell_max_angle_sensor_rot ) > M_PI/2 ) {
                // Point must be smaller than the minimum angle and bigger than the max angle
                // (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
                if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot
                   || pointcloud[i].pol_sensor_rot.phi  >= cell_max_angle_sensor_rot ) {
                    local_relevant_points.push_back( pointcloud[i] );
                }
            }
            else {
                if ( pointcloud[i].pol_sensor_rot.phi  >= cell_min_angle_sensor_rot
                   && pointcloud[i].pol_sensor_rot.phi  <= cell_max_angle_sensor_rot ) {
                    local_relevant_points.push_back( pointcloud[i] );
                }
            }
        }
        auto Size = relevant_points.size();
        #pragma omp critical
        Size += local_relevant_points.size();
        #pragma omp barrier
        #pragma single
        relevant_points.reserve( Size );
        #pragma omp critical
        for ( int i = 0; i < local_relevant_points.size(); i++ ) {
            relevant_points.push_back( local_relevant_points[i] );
        }
    }
}

The idea is to make a local processing in parallel, and then to accumulate the local results in the global output. I've added an extra refinement by computing the expected size of the result in order to reserve the correct size in advance to spare extra lengthy memory allocations.

Now, 2 caveats:

  1. I haven't compile it so there might be some typo in there;
  2. I'm not sure parallelizing this will bring much benefit since the whole process is likely to be both too short and memory bound... But there you go anyway.
Gilles
  • 9,269
  • 4
  • 34
  • 53
0

Here is what my function looks like now :

void IntervalMapEstimator::extract_relevant_points_multithread(std::vector<Point3D>&  relevant_points ,std::vector<Point3D>& pointcloud, doubleIE cell_min_angle_sensor_rot, doubleIE cell_max_angle_sensor_rot)
{
    relevant_points.reserve (pointcloud.size ());

#pragma omp parallel for shared (relevant_points, pointcloud, cell_min_angle_sensor_rot, cell_max_angle_sensor_rot) num_threads(5)
        for(int i = 0; i < pointcloud.size(); i++) {
            //int numThread = omp_get_thread_num();
            //std::cout << "numThread = " << numThread << std::endl;

            // Check whether the cell is between the 2nd and 3rd quadrant (--> e.g. -170 to 170°)
            if ( cell_min_angle_sensor_rot < 0 && cell_max_angle_sensor_rot >= 0 && abs(cell_min_angle_sensor_rot) > M_PI/2 && abs(cell_max_angle_sensor_rot) > M_PI/2) {
                // Point must be smaller than the minimum angle and bigger than the max angle (e.g. min-angle: -1.5 max-angle: 1.5 point angle bigger than 1.5 or smaller than -1.5)
                if ( pointcloud[i].pol_sensor_rot.phi <= cell_min_angle_sensor_rot || pointcloud[i].pol_sensor_rot.phi  >= cell_max_angle_sensor_rot ) {                   
#pragma omp critical(push_in_relevant_points)
                    relevant_points.push_back(pointcloud[i]);
                }
            } else {
                 if (pointcloud[i].pol_sensor_rot.phi  >= cell_min_angle_sensor_rot && pointcloud[i].pol_sensor_rot.phi  <= cell_max_angle_sensor_rot ) {
#pragma omp critical(push_in_relevant_points)
                     relevant_points.push_back(pointcloud[i]);
                 }
            }
        }
}

I used mop critical instead of guard_lock for more consistency and i did a reserve at the beginning. Because in my case the relevant_points vector is never going to be bigger than point cloud and most of the time he contain only 50 to 100 elements. And i'm not using a copy of the vector anymore but a reference.

tony497
  • 360
  • 1
  • 3
  • 15