I'm working on a quasi-SCPI command parser and I want to split a string based on colons, ignoring quoted strings. I want to get an empty string if there is no text between colons.
If I use this regex expression in EditPad Pro 7.2.2, it does exactly what I want. (([^:\"']|\"[^\"]\"|'[^']')+)?
As an example, using this data string: :foo:::bar:baz
I get 6 hits: [empty],foo,[empty],[empty],bar,baz
So far, so good. However, in my code, using std::tr1::regex, I'm getting 9 hits with the same data string. It seems like I'm getting an extra empty hit after each non-empty hit.
void RICommandState::InitRawCommandEnum(const std::string& full_command)
{
// Split string by colons, but ignore text within quotes.
static const std::tr1::regex split_by_colon("(([^:\"']|\"[^\"]*\"|'[^']*')+)?");
raw_command_list.clear();
raw_command_index = 0;
DebugPrintf(ZONE_REMOTE, (TEXT("InitRawCommandEnum FULL '%S'"), full_command.c_str()));
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator it(full_command.begin(),
full_command.end(),
split_by_colon);
it != end;
it++)
{
raw_command_list.push_back(*it);
const std::string temp(*it);
DebugPrintf(ZONE_REMOTE, (TEXT("InitRawCommandEnum '%S'"), temp.c_str()));
}
DebugPrintf(ZONE_REMOTE, (TEXT("InitRawCommandEnum hits = %d"), raw_command_list.size()));
}
And here is my output:
InitRawCommandEnum FULL ':foo:::bar:baz'
InitRawCommandEnum ''
InitRawCommandEnum 'foo'
InitRawCommandEnum ''
InitRawCommandEnum ''
InitRawCommandEnum ''
InitRawCommandEnum 'bar'
InitRawCommandEnum ''
InitRawCommandEnum 'baz'
InitRawCommandEnum ''
InitRawCommandEnum hits = 9
The most important question is how can I get my regex search to yield one (and only one) hit for every token delimited by a colon? Is the problem with my search expression?
Or maybe I'm misinterpreting the results? Do the empty strings after the non-empty strings have a special meaning? If so, what? And if that's the case, then is the correct solution to simply ignore them?
As a side question, I'm deeply curious why my code is behaving differently than EditPad Pro. EditPad is a useful test environment for experimenting with regular expressions, and it would be nice to know what the gotchas are.
Thanks!