I'm fairly new to C++, therefore please pardon if this is a stupid question, but I didn't find good example of what I'm looking for on the internet.
Basically I'm using a parallel_for cycle to find a maximum inside a 2D array (and a bunch of other operations in between). First of all I don't even know if this is the best approach, but given the length of this 2D array, I though splitting the calculations would be faster.
My code:
vector<vector<double>> InterpU(1801, vector<double>(3601, 0));
Concurrency::parallel_for(0, 1801, [&](int i) {
long k = 0; long l = 0;
pair<long, long> Normalized;
double InterpPointsU[4][4];
double jRes;
double iRes = i * 0.1;
double RelativeY, RelativeX;
int p, q;
while (iRes >= (k + 1) * DeltaTheta) k++;
RelativeX = iRes / DeltaTheta - k;
for (long j = 0; j < 3600; j++)
{
jRes = j * 0.1;
while (jRes >= (l + 1) * DeltaPhi) l++;
RelativeY = jRes / DeltaPhi - l;
p = 0;
for (long m = k - 1; m < k + 3; m++)
{
q = 0;
for (long n = l - 1; n < l + 3; n++)
{
Normalized = Normalize(m, n, PointsTheta, PointsPhi);
InterpPointsU[p][q] = U[Normalized.first][Normalized.second];
q++;
}
p++;
}
InterpU[i][j] = bicubicInterpolate(InterpPointsU, RelativeX, RelativeY);
if (InterpU[i][j] > MaxU)
{
SharedDataLock.lock();
MaxU = InterpU[i][j];
SharedDataLock.unlock();
}
}
InterpU[i][3600] = InterpU[i][0];
});
You can see here that I'm using a mutex
called SharedDataLock
to protect multiple threads accessing the same resource. MaxU
is a variable that should only containe the maximum of the InterpU
vector.
The code works well, but since I'm having speed performance problem, I began to look into atomic
and some other stuff.
Is there any good example on how to modify a similar code to make it faster?