Regex not working having reserved character in search text c#

Question

I have a regular expression which was working for all my requirement until now, suddenly I got a string which has reserved character like + in c++ and # in C#. Below code work for all my word collection except for c++ and C#

MatchCollection matches= Regex.Matches(@"This  program is written in C# We'll delete it after ten days", @"\bC\+\+\b");
foreach(Match m in matches)
{
      Console.Write(m.Value);
}

Can any one point out why?

@Tafari, that's because of how you setup the boundaries. But I'm not convinced those are necessary either. You'd need `\b ... \B`. — Mike Perrenoud, Dec 03 '13 at 14:08
@MichaelPerrenoud It's not my question, and yes the problem is related to the boundries, the closing one to be precise, it's better to use just whitespace `\s` instead of `\b`. — Tafari, Dec 03 '13 at 14:10
@Tafari,LOL, my apologies friend! I really need to get through this cup of coffee faster! — Mike Perrenoud, Dec 03 '13 at 14:11

score 3 · Answer 1 · answered Dec 03 '13 at 13:51

3

You should use \B on the 2nd boundary instead of \b

MatchCollection matches= Regex.Matches(@"This  program is written in C# We'll delete it after ten days", @"\bC\#\B");

You can read the following link for more info : http://www.regular-expressions.info/wordboundaries.html

answered Dec 03 '13 at 13:51

BasssS

921
1
6
5

Thanks, Your answer is correct but its not fitting into my requirement if I change my input text to "This program is written inC#We'll delete it after ten days" then it should not match since I need to match complete word only – AnsariTanveer Dec 04 '13 at 12:44
1

Thanks, Your answer is correct but its not fitting into my requirement if I change my input text to MatchCollection matches = Regex.Matches(@"This program is written in Microsoft visual basic delete it after ten days", @"\bMicrosoft\ visual\ basic\B"); – AnsariTanveer Dec 04 '13 at 12:54

score 1 · Answer 2 · answered Dec 03 '13 at 14:05

You could use following pattern, which store match in the group 1:

PATTERN

\bC(\+\+|\#)\s

And this C# code:

CODE

MatchCollection matches= Regex.Matches(@"This  program is written in C# We'll delete it after ten days", @"\bC\+\+\b");

foreach(Match m in matches)
{
     Console.Write(m.Groups[1].Value);
}

INPUT

This  program is written in C# We'll delete it after ten days

OUTPUT

C#

And

INPUT

This  program is written in C++ We'll delete it after ten days

OUTPUT

C++

score 0 · Answer 3 · answered Dec 03 '13 at 13:51

0

Below code work for all my word collection except for c++ and C#

For that match to work you'd need a Regex like this @"(?:C\+\+)|(?:C#)" and here is a Regex 101 to prove it.

answered Dec 03 '13 at 13:51

Mike Perrenoud

66,820
29
157
232

score 0 · Answer 4 · answered Dec 03 '13 at 15:28

In your situation, rather than looking for a word boundary (\b) or a non-word boundary (\B), you might instead consider looking for whitespace (\s+), beginning of the line (^), and end of the line ($).

Here's a regex that will do that:

(?:^|\s+)(C#|C\+\+)(?=\s+|$)

And here's a Perl program that demonstrates that regex on a sample data set. (Also see the live demo.)

#!/usr/bin/perl -w

use strict;
use warnings;

while (<DATA>) {
    chomp;

#   A - Preceded by the beginning of the line or 1 or more whitespace
#       characters
#   B - The character sequences 'C#' or 'C++'
#   C - Followed by 1 or more whitespace characters or the end of line.

    if (/(?:^|\s+)(C#|C\+\+)(?=\s+|$)/) {
#           ^^^^^  ^^^^^^^^    ^^^^^
#             A        B         C

        print "[$1] [$_]\n";
    } else {
        print "[--] [$_]\n";
    }
}

__END__
This program is written in C++ We'll delete it after ten days
This program is written in !C++ We'll delete it after ten days
This program is written in C++! We'll delete it after ten days
This program is written in C# We'll delete it after ten days
C# is the language this program is written in.
 C# is the language this program is written in.
C++ is the language this program is written in.
This program is written in C#
This program is written in C++
This program is written in C++!

Expected Output:

[C++] [This program is written in C++ We'll delete it after ten days]
[--] [This program is written in !C++ We'll delete it after ten days]
[--] [This program is written in C++! We'll delete it after ten days]
[C#] [This program is written in C# We'll delete it after ten days]
[C#] [C# is the language this program is written in.]
[C#] [ C# is the language this program is written in.]
[C++] [C++ is the language this program is written in.]
[C#] [This program is written in C#]
[C++] [This program is written in C++]
[--] [This program is written in C++!]

Regex not working having reserved character in search text c#

4 Answers4