I have a small parser that I am trying to optimize.
It reads a bunch of files (a thousand in typical usage) fills a vector<code_file>
code_file is a struct which among other things contains the content of the file.
the second step (the longer one and the one I'm trying to optimize) goes through the vector and for each file parse( it )
, which fills the code_file struct. no modification of the vector involved.
for( auto it = code_files.begin(); it != code_files.end(); ++it )
{
parseCodeFile( *it );
}
It seems to be reasonably parallelisable but it actually perform worse if I use a parallel_for instead of a for on my quad core machine... the parseCodeFile function does not have any locking.
I tried with a profiler and I noticed it spends almost 50% of the time in std::_Lockit::_Lockit
(25%) and std::_Lockit::~_Lockit
(25%)
Is there a way to avoid this?
I saw this post: What std::_lockit does? but I run it in release and the defines seems ok:
_SECURE_SCL=0
_HAS_ITERATOR_DEBUGGING=0
_ITERATOR_DEBUG_LEVEL=0
Some additional information: the parsing function obviously does a lot of string manipulations, it uses some boost utilities like boost::trim and boost::starts_with it is compiled with Visual Studio 2010.