I am receiving an array of Eigen::MatrixXf
and Eigen::Matrix4f
in realtime. Both of these arrays are having an equal number of elements. All I am trying to do is just multiply elements of both the arrays together and storing the result in another array at the same index.
Please see the code snippet below-
#define COUNT 4
while (all_ok())
{
Eigen::Matrix4f trans[COUNT];
Eigen::MatrixXf in_data[COUNT];
Eigen::MatrixXf out_data[COUNT];
// at each iteration, new data is filled
// in 'trans' and 'in_data' variables
#pragma omp parallel num_threads(COUNT)
{
#pragma omp for
for (int i = 0; i < COUNT; i++)
out_data[i] = trans[i] * in_clouds[i];
}
}
Please note that COUNT
is a constant. The size of trans
and in_data
is (4 x 4)
and (4 x n)
respectively, where n
is approximately 500,000. In order to parallelize the for
loop, I gave OpenMP
a try as shown above. However, I don't see any significant improvement in the elapsed time of for
loop.
Any suggestions? Any alternatives to perform the same operation, please?
Edit: My idea is to define 4 (=COUNT
) threads wherein each of them is taking care of multiplication. In this way, we don't need to create threads every time, I guess!