TBB parallelization of parsing with boots::spirit::qi

Question

In my program, I use the Boost-Spirit-Qi to parse large data sets. Input data are sequential records . I am trying to use the TBB to increase the efficiency of parsing. The procedure for parallel processing is as follows:

typedef map<string, data_struct_t> mdata_t;
vector<string> text; 
mdata_t  data;

parallel_for(blocked_range<size_t>(0, input.size(), gs),
                     [&]  (blocked_range<size_t>& r) {
        data_struct_t cs;
        mdata_t cr;
        string s;
        for(size_t i=r.begin(); i<r.end(); i++) {
           s = text[i];         
           Parser::task1(s, cs); 
           Parser::task2(s, cs); 
           Parser::task3(s, cs);
        ....
           Parser::task8(s, cs);   
           cr.insert(std::make_pair(cs.title, cs));
        }
        data.insert(cr.begin(), cr.end());  

 }, ap);

My program uses only 10% of the CPU (2 CPU, 16 cores) and works on 8 cores. I do not understand why the remaining 8 cores are not used (single processor). I would be grateful for pointing me to the correct algorithm parallelization this task.

Thanks for the advice.

Stan

score 0 · Accepted Answer · answered Jan 22 '15 at 14:46

Your input.size() might be small or gs is too big to prevent creation of enough amount of parallelism. Otherwise, if the number of threads is of the concern, check process (affinity) mask of your program when you start it and how TBB is initialized (e.g. if tbb::task_scheduler_init is created with small number of threads).

As for CPU utilization, it is expected when your work is IO-bound, i.e. reading a file. It is also possible that the time necessary to complete one parallel iteration differs a lot from another iteration. In this case, small iterations might be completed even before all the threads are created. (You should manually wait when all the threads are operational if you want to measure speedup accurately)

Advices:

You have a bug with data.insert since std::map is not safe for concurrent modification. Use tbb::concurrent_unordered_map or just tbb::parallel_reduce in order to merge partial results collected in cr from different threads.

The pattern Parser::task1(s, cs); ... Parser::task8(s, cs); can also be parallelized if the tasks do not share a global state. See tbb::parallel_pipeline which will enable pipeline-type of parallelism for the chain of these independent tasks.

Many thanks Anton. With a few in your comments, I can not agree eg.: imput.size = 100000, gs tested in the range 1-10000, do not use tasks for several obvious reasons. I agree with you that it should be used to parallel reduce or concurren containers. Do you know a link to a website with examples. — stansy, Jan 22 '15 at 15:39
@stansy, I just tried to cover all the possibilities for the behavior, if it is not the range and the number of tasks it can create, then either small amount of work inside or some limitation for the number of threads provided via themask or TBB API. Still, the number of threads is separate issue from CPU utilization which I covered above. You can find examples in TBB package and on the Reference pages. — Anton, Jan 22 '15 at 15:53
Thanks @Anton. After several tests found a solution to this problem. The conclusion is simple: Always use iterators never indices. If you are not using complex calculations using TBB containers does not improve significantly the computation time. With the same success you can use the standard containers. — stansy, Jan 25 '15 at 01:11

TBB parallelization of parsing with boots::spirit::qi

1 Answers1