2
//In other words, this equilavent to cv::Mat1f mat(5,n)
//i.e. a matrix 5xn
std::vector<cv::Mat1f> mat(5,cv::Mat1f::zeros(1,n));
std::vector<float> indexes(m);
// fill indexes
// m >> nThreads (from hundreds to thousands)
for(size_t i=0; i<m; i++){
  mat[indexes[m]] += 1;
}

The expected result is to increase each element of each row by one. This is a toy example, the actual sum is far more compliacted. I tried to parallelize it with:

#pragma omp declare reduction(vec_float_plus : std::vector<cv::Mat1f> : \
            std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<cv::Mat1f>())) \
            initializer(omp_priv=omp_orig);

#pragma omp parallel for reduction(vec_float_plus : mat)
for(size_t i=0; i<m; i++){
    mat[indexes[m]] += 1;
}       

But this fails because each element of each row is randomly inizitialized. How can I solve this?

So I found out that the problem is related to this. So I should initialize mat with:

std::vector<cv::Mat1f> mat(5);
for(size_t i=0; i<mat.size(); i++)
  mat[i] = cv::Mat1f::zeros(1,n);

But then this would create problems with omp_priv = omp_orig, since it would consider std::vector<cv::Mat1f> mat(5); and it's values are undefined. How can I solve this? The only solution that came to my mind is to create a wrapper structure, something like:

class vectMat{
public:
    vectMat(size_t rows, size_t j){
        for(size_t i=0; i<rows; i++)
            mats.push_back(cv::Mat1f::zeros(1,j));
    }
private:
    std::vector<cv::Mat1f> mats;
};

But then what should I implement to make it work with the rest of the code?

Community
  • 1
  • 1
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138

1 Answers1

1

Types such as cv::Mat1f, that use references instead of copying, are indeed dangerous in this context. You make a clear explicit solution by splitting the parallel region and the for loop.

#pragma omp declare reduction(vec_mat1f_plus : std::vector<cv::Mat1f> : \
            std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<cv::Mat1f>()));
// initializer not necessary if you initialize explicitly

std::vector<cv::Mat1f> mat;
#pragma omp parallel reduction(vec_mat1f_plus : mat)
{
  mat = std::vector<cv::Mat1f>(5);
  for (auto& elem : mat) {
    elem = cv:Mat1f::zeros(1, n);
  }
  #pragma omp for
  for(size_t i=0; i<m; i++){
    mat[indexes[m]] += 1;
  }
}

I haven't tested whether std::plus<cv::Mat1f> works, but it looks good.

Your approach with vectMat will also work if you provide an operator= that deep-copies the underlying Mat with clone(), and keep the initializer.

Zulan
  • 21,896
  • 6
  • 49
  • 109