//In other words, this equilavent to cv::Mat1f mat(5,n)
//i.e. a matrix 5xn
std::vector<cv::Mat1f> mat(5,cv::Mat1f::zeros(1,n));
std::vector<float> indexes(m);
// fill indexes
// m >> nThreads (from hundreds to thousands)
for(size_t i=0; i<m; i++){
mat[indexes[m]] += 1;
}
The expected result is to increase each element of each row by one. This is a toy example, the actual sum is far more compliacted. I tried to parallelize it with:
#pragma omp declare reduction(vec_float_plus : std::vector<cv::Mat1f> : \
std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<cv::Mat1f>())) \
initializer(omp_priv=omp_orig);
#pragma omp parallel for reduction(vec_float_plus : mat)
for(size_t i=0; i<m; i++){
mat[indexes[m]] += 1;
}
But this fails because each element of each row is randomly inizitialized. How can I solve this?
So I found out that the problem is related to this. So I should initialize mat
with:
std::vector<cv::Mat1f> mat(5);
for(size_t i=0; i<mat.size(); i++)
mat[i] = cv::Mat1f::zeros(1,n);
But then this would create problems with omp_priv = omp_orig
, since it would consider std::vector<cv::Mat1f> mat(5);
and it's values are undefined. How can I solve this? The only solution that came to my mind is to create a wrapper structure, something like:
class vectMat{
public:
vectMat(size_t rows, size_t j){
for(size_t i=0; i<rows; i++)
mats.push_back(cv::Mat1f::zeros(1,j));
}
private:
std::vector<cv::Mat1f> mats;
};
But then what should I implement to make it work with the rest of the code?