I am learning to work with OpenCV and TBB. I need to learn how to use multiprocessing of images because I have multicore CPU and want to create muticpu support for my programs.
I have read an article "The Foundations for Scalable Multi-core Software in Intel® Threading Building Blocks" in Intel®Technology Journal paper (you can find it in the pdf here http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.8289&rep=rep1&type=pdf)
They use fabonacci number calculation as an example of multiprocessing. There is also similar fabonacci number example in TBB examples in TBB package (see ParallelTaskFib). The only problem is that the calculation is that simple that it is not much burden for CPU so when you run multitasking on small numbers an low CutOff it is not much efficient because it takes too much overhead. So to learn to work with TBB I need more practical example from image processing. In my concept I would like to use TBB Task Scheduler. I started with a class FibTask and function ParallelFib which I renamed, changed arguments to work with vectors of images. The basic principle how it was designed should stay untouched. The fabonacci example includes only two children called a and b. Now the problem is that I am not sure if I can use more than two children in one function matTask (which was originally called 'execute'). So I have tried to add more called, more pointers and more waiting spawn_and_wait_for_all()... In this stage I did not create any image processing functions because I want to ask you if this design is correct and if there would be not performance problems. It is not finished. I will wait for your suggestions to fix possible mistakes in my concept.
My basic idea is to use some filter function like gaussian blur on lena.jpg. First I would pass a number of threads. I have 8 cores so only 8 threads I can pass as maximum. I plan to separate lena image to 8 strips of same size and then to copy pixels to vectors (8 basic vectors), Then they should be blured. Then another stage is that I need to create next 7-8 images which overlap the margins of the 8 sections. I want to repeat only the bluring action. Finally one more pass is needed for area which could be rest of the image (the remains from source_image.rows()/8).
The main thing I need to solve (I do not know how to do) is stop infinite loop. Should I create different class and different methods for 1) coping and 2) bluring 3) cropping 4) pasting ? Or can I pass everything (copy+blur) in one call? This is the difference from fabonnaci number example because that code did the same thing, but I need to do more different things... So what should be the logic, how to sort things, how to name functions?
Easier solution would be to use 8 strips of same size... And then 7-8 overlaying areas.
The code bellow prints no error, but it is not suppose to return correct result because It is just temporal concept.
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include "tbb/task.h"
#include "tbb/task_scheduler_init.h"
#define CutOff 12
using namespace cv;
void SerialAction(int n){};
/**
**/
class matTask: public tbb::task {
public:
int n;
const int offset;
std::vector<cv::Mat> main_layers;
std::vector<cv::Mat> overlay_layers;
matTask( std::vector<cv::Mat>main_layers_, std::vector<cv::Mat> overlay_layers_, int n_, const int offset_ ) :
main_layers(main_layers_),
overlay_layers(overlay_layers_),
n(n_), offset(offset_)
{}
task* execute() {
if( n<CutOff ) {
SerialAction(n);
}
else {
// Main layers - copy regions
matTask& a = *new( allocate_child() )
matTask(main_layers,overlay_layers,n,0);
matTask& b = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-1,0);
matTask& c = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-2,0);
matTask& d = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-3,0);
matTask& e = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-4,0);
matTask& f = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-5,0);
matTask& g = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-6,0);
matTask& h = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-7,0);
spawn_and_wait_for_all( a );
spawn_and_wait_for_all( b );
spawn_and_wait_for_all( c );
spawn_and_wait_for_all( d );
spawn_and_wait_for_all( e );
spawn_and_wait_for_all( f );
spawn_and_wait_for_all( g );
spawn_and_wait_for_all( h );
// In the case of effect:
// Overlay layers
matTask& ab = *new( allocate_child() )
matTask(main_layers,overlay_layers,n,offset);
matTask& bc = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-1,offset);
matTask& cd = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-2,offset);
matTask& de = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-2,offset);
matTask& ef = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-2,offset);
matTask& gh = *new( allocate_child() )
matTask(main_layers,overlay_layers,n-2,offset);
// ... + crop .. depends on size of kernel
set_ref_count(8);
spawn( b );
spawn_and_wait_for_all( a );
}
return NULL;
}
};
void ParallelAction( std::vector<cv::Mat> main, std::vector<cv::Mat> overlays, int n, const int offset ) {
matTask& a = *new(tbb::task::allocate_root())
matTask(main, overlays, n,offset);
tbb::task::spawn_root_and_wait(a);
}
int main( int argc, char** argv )
{
int threads = 8;
std::vector<cv::Mat> main_layers;
std::vector<cv::Mat> overlays;
cv:: Mat sourceImg;
sourceImg = imread( "../../data/lena.jpg");
if ( sourceImg.empty() )
return -1;
const int offset = (int) sourceImg.rows / threads;
cv::setNumThreads(0);
ParallelAction(main_layers, overlays, threads, offset );
// GaussianBlur( src, dst, Size(3,3), 0, 0, BORDER_DEFAULT );
return 0;
}
Edit: Reaction to Anton's answer. If I use operator() overload, when exactly is the operator () applied? Also is it possible to add some methods to ApplyFoo? WWhen the () is overloaded, it seems there can be only one method.
void Foo(float a){};
class ApplyFoo {
float *const my_a;
public:
void operator()( const tbb::blocked_range<size_t>& r ) const {
float *a = my_a;
for( size_t i=r.begin(); i!=r.end(); ++i )
Foo(a[i]);
}
ApplyFoo( float a[] ) :
my_a(a) // initiate my_a
{}
};