Using VS 2010 with VsVim.
I'm finding that a search for quoted text, e.g.
/["][^"]\{0,\}["]
Which is one of the several different efforts I made to find a quote, maybe some stuff excluding quotes, followed by another quote, for example "stuff"
, or ""
(empty string), or whatever other quoted string.
This search is not necessarily confined to a single line. I haven't yet done any exhaustive analysis to determine if there's something in common with all those multi-line matches, though an escaped quote (\"
) seems to be fairly common.
I tried a number of restrictive additions, but it wasn't until I added \n
to the exclusion set that it quit doing that.
In gVim, the search is, by default, confined to a single line.
I can see the advantage of having a search cross line-end boundaries, but that's not what I want as the default.
Is there a setting I missed?
Here's an example of a match from that regex:
oss << "bonus game conditions \"" << index << "\" not found for bonus game \""
<< bonusGameID << "\"";
That's two lines of code, containing five matches.
"bonus game conditions \"
" << index << "
" not found for bonus game \"
then
"
<< bonusGameID << "
and
""
The match we're interested in is this one:
"
<< bonusGameID << "
Because that's the one that spans a line break.
Here's another one, with the first quote being single-quoted between apostrophes:
CharReplace( *it, wchar_t( '"' ), L"<DQ>" );
if ( !arg.empty() )
{
args.push_back( arg );
}
}
it++;
}
// Re-compose the string from words
WordsToString( words, delimiters, _str );
// Replace newlines with ICU specifiers (NOTE: CS3 uses '\r' instead of '\n' for a newline)
CharReplace( _str, wchar_t( '\n' ), L"<NL>" ) || CharReplace( _str, wchar_t( '\r' ), L"<NL>" );
The first match begins where the '"'
is on line one up to the L"
before <DQ>
, and the second one picks up the "
after the <DQ>
and continues until the opening quote of "<NL>"
on the last line.
What's happening isn't really a mystery. What's puzzling is the decision to allow the match to include the line breaks by default.
Okay, so I did a little more experimenting. Here's what I think is happening.
It's the [^"]
sub-expression as nearly as I can tell. It doesn't matter what I begin the expression with, if I write /_[^_]_/
I get similar behavior, a newline is—quite literally—"not a member of the set of only an underscore". (Alternatively, it is implicitly a member of "everything that's not an underscore".)
In a normal search (/_.*_/
, which does a greedy search for whatever is between two underscores), the search stays on a single line (or, more precisely, doesn't cross a newline break), because a newline not actually a member of "zero or more of anything at all".
So a newline is not a member of .*
but it is a member of [^_]
or [^"]
or pick whatever you want to exclude.
To ensure the search does not cross a newline boundary, an exclusion set must explicitly include \n
(newline).
I have confirmed that gVim implicitly excludes newlines from the "not-one-of-these" set. In order to get gVim to include the end-of-line, one must prepend \_
to the collection brackets, thus: \_[]
(or include a \n
in the collection).
Conclusion: If this is a "feature" in VsVim, it should be something you can turn on and off. If it is unexpected behavior, then could it be a bug?