I am using a simple regex to match against a string read from the OS that has teh following format:
timestamp:{comma separated list of values}
Where timestamp is unsigned the values are unsigned
To do this I was using the following regex using boost::xpressive
std::vector< uint32_t > idleReports;
uint32_t timestamp = 0;
sregex cpuIdleVal = ( +_d )[ push_back( xp::ref( idleReports ), as<unsigned>( _ ) ) ];
sregex cpuIdleData = cpuIdleVal >> "," | cpuIdleVal;
sregex fullMatch = ( +_d )[ xp::ref( timestamp ) = as<unsigned>( _ ) ]
>> ":" >> +cpuIdleData;
smatch what;
if( regex_match( test, what, fullMatch ) )
{
// stuff
}
All works fine for the success case, benchmarking shows that the regex takes approx 80usec to match the following string:
"1381152543:988900,987661,990529,987440,989041,987616,988185,988346,968859,988919,859559,988967,991040,988942"
If the input string contains a negative value in one of the values the performance degrades significantly, shuch that if the value4 is negative the regex takes 13seconds to report a failure.
If value5 is negative the time taken is even longer.
Why is the performance so bad for failure cases?
I have fixed the issue by changing the original regex to:
sregex cpuIdleData = "," >> cpuIdleVal;
sregex fullMatch = ( +_d )[ xp::ref( timestamp ) = as<unsigned>( _ ) ]
>> ":" >> cpuIdleVal >> -+ cpuIdleData ;
i.e. making the match against the comma separated list non-greedy.
In the changed version the failure scenarios perform just as well (or slightly better than) the success scenarios.