In your situation, rather than looking for a word boundary (\b
) or a non-word boundary (\B
), you might instead consider looking for whitespace (\s+
), beginning of the line (^
), and end of the line ($
).
Here's a regex that will do that:
(?:^|\s+)(C#|C\+\+)(?=\s+|$)
And here's a Perl program that demonstrates that regex on a sample data set. (Also see the live demo.)
#!/usr/bin/perl -w
use strict;
use warnings;
while (<DATA>) {
chomp;
# A - Preceded by the beginning of the line or 1 or more whitespace
# characters
# B - The character sequences 'C#' or 'C++'
# C - Followed by 1 or more whitespace characters or the end of line.
if (/(?:^|\s+)(C#|C\+\+)(?=\s+|$)/) {
# ^^^^^ ^^^^^^^^ ^^^^^
# A B C
print "[$1] [$_]\n";
} else {
print "[--] [$_]\n";
}
}
__END__
This program is written in C++ We'll delete it after ten days
This program is written in !C++ We'll delete it after ten days
This program is written in C++! We'll delete it after ten days
This program is written in C# We'll delete it after ten days
C# is the language this program is written in.
C# is the language this program is written in.
C++ is the language this program is written in.
This program is written in C#
This program is written in C++
This program is written in C++!
Expected Output:
[C++] [This program is written in C++ We'll delete it after ten days]
[--] [This program is written in !C++ We'll delete it after ten days]
[--] [This program is written in C++! We'll delete it after ten days]
[C#] [This program is written in C# We'll delete it after ten days]
[C#] [C# is the language this program is written in.]
[C#] [ C# is the language this program is written in.]
[C++] [C++ is the language this program is written in.]
[C#] [This program is written in C#]
[C++] [This program is written in C++]
[--] [This program is written in C++!]