std::_Lockit performance

Question

I have a small parser that I am trying to optimize.

It reads a bunch of files (a thousand in typical usage) fills a vector<code_file> code_file is a struct which among other things contains the content of the file.

the second step (the longer one and the one I'm trying to optimize) goes through the vector and for each file parse( it ), which fills the code_file struct. no modification of the vector involved.

for( auto it = code_files.begin(); it != code_files.end(); ++it )
{
    parseCodeFile( *it );
}

It seems to be reasonably parallelisable but it actually perform worse if I use a parallel_for instead of a for on my quad core machine... the parseCodeFile function does not have any locking.

I tried with a profiler and I noticed it spends almost 50% of the time in std::_Lockit::_Lockit (25%) and std::_Lockit::~_Lockit (25%)

Is there a way to avoid this?

I saw this post: What std::_lockit does? but I run it in release and the defines seems ok:

_SECURE_SCL=0
_HAS_ITERATOR_DEBUGGING=0
_ITERATOR_DEBUG_LEVEL=0

Some additional information: the parsing function obviously does a lot of string manipulations, it uses some boost utilities like boost::trim and boost::starts_with it is compiled with Visual Studio 2010.

I'd guess at iostream, it is peppered with locks to make it thread-safe. It just wasn't ever designed with threading in mind way back when. Using threads to read files is in itself a very bad idea, you send the disk reader head back-and-forth. Very expensive. — Hans Passant, Jun 11 '14 at 21:17

score 0 · Answer 1 · answered Jun 11 '14 at 20:04

0

Just a guess since I can't see all your code, but note that code_files could change in length inside parseCodeFile. If it reallocates, that's UB, but shrinking could be safe. MSVC++ just doesn't know and has to recheck .end().

Of course, that's all a bit irrelevant for parallel_for. You don't show that code, but it's documented as working on integral types only. Microsoft may describe it as STL-like, but it clearly isn't - STL-like would support iterators.

answered Jun 11 '14 at 20:04

MSalters

173,980
10
155
350

hm nope code_files can't change in length ... don't worry about correctness it does work and produce the same result. I'm just wondering why it doesn't perform better with several threads. As for the parallel_for, it isn't Microsoft's implementation but my own. Microsoft's implementation performs a lot worse for some reason. 10% additional time with my parallel_for, 55% with Microsoft's – foke Jun 11 '14 at 21:05
@foke: How do you tell **the compiler** not to worry? It will. And the performance of your `parallel_for` is something that we can't judge without code, of course. Are you using divide and conquer? (While the range is large enough, split the range in two, and create a thread to process the second half) Or straightforward partitioning (Break in range in N parts, create N threads)? The motivation for the former is that you can have N/2 threads creating N/2 threads in parallel. – MSalters Jun 11 '14 at 21:18

std::_Lockit performance

1 Answers1