I originally have a single-threaded loop which iterates over all pixels of an image and may do various operation with the data.
The library I am using dictates that retrieving pixels from an image must be done one line at a time. To this end I malloc
a block of memory which can host one row of pixels (BMM_Color_fl
is a struct containing one pixel's RGBA data as four float values, and GetLinearPixels()
copies one row of pixels from a bitmap into a BMM_Color_fl
array.)
BMM_Color_fl* line = (BMM_Color_fl*)malloc(width * sizeof(BMM_Color_fl));
for (int y = 0; y < height, y++)
{
bmp->GetLinearPixels(0, y, width, line); //Copy data of row Y from bitmap into line.
BMM_Color_fl* pixel = line; //Get first pixel of line.
for (int x = 0; x < width; x++, pixel++) // For each pixel in the row...
{
//Do stuff with a pixel.
}
}
free(line);
So far so good!
For the sake of reducing execution time of this loop, I have written a concurrent version using parallel_for
, which looks like this:
parallel_for(0, height, [&](int y)
{
BMM_Color_fl* line = (BMM_Color_fl*)malloc(width * sizeof(BMM_Color_fl));
bmp->GetLinearPixels(0, y, width, line);
BMM_Color_fl* pixel = line;
for (int x = 0; x < width; x++, pixel++)
{
//Do stuff with a pixel.
}
free(line);
});
While the multithreaded loop is already faster than the original, I realize it is impossible for all threads to use the same memory block, so currently I am allocating and freeing the memory at each loop iteration, which is obviously wasteful as there will never be more threads than loop iterations.
My question is if and how can I have each thread malloc
exactly one line buffer and use it repeatedly (and ideally, free it at the end)?
- As a disclaimer I must state I am a novice C++ user.
Implementation of suggested solutions:
Concurrency::combinable<std::vector<BMM_Color_fl>> line;
parallel_for(0, height, [&] (int y)
{
std::vector<BMM_Color_fl> lineL = line.local();
if (lineL.capacity() < width) lineL.reserve(width);
bmp->GetLinearPixels(0, y, width, &lineL[0]);
for (int x = 0; x < width; x++)
{
BMM_Color_fl* pixel = &lineL[x];
//Do stuff with a pixel.
}
});
As suggested, I canned the malloc
and replaced it with a vector
+reserve
.