I am using Intel TBB parallel_for to speed up a for loop doing some calculations:
tbb::parallel_for(tbb::blocked_range<int>(0,ListSize,1000),Calc);
Calc is an object of the class doCalc
class DoCalc
{
vector<string>FileList;
public:
void operator()(const tbb::blocked_range<int>& range) const{
for(int i=range.begin(); i!=range.end();++i){
//Do some calculations
}
}
DoCalc(vector<string> ilist):FileList(ilist){}
};
It takes approx. 60 seconds when I use the standard serial form of the for loop and approx. 20 seconds when I use the parallel_for from TBB to get the job done. When using standard for, the load of each core of my i5 CPU is at approx. 15% (according windows task manager) and very inhomogeneous and at approx. 50% and very homogeneous when using parallel_for.
I wonder if it's possible to get an even higher core load when using parallel_for. Are there any other parameters except grain_size? How can I boost the speed of parallel_for without changing the operations within the for loop (here as //Do some calculations in the code sample above).