0

I have some HTML Text. When i display that i want to highlight some keywords. I dont want to match if that is a part of html tag or any special characters like  

for eg : My HTML Text : Hello  Welcome to my Spa No. 160

my keywords : spa 160

for highlighting i use keyword

But now its matching the spa inside the tag and 160 inside the special char  

How to overcome this...??? I use C# RegEx.

I need a RegEx that matches the keyword but not in tags or special characters.

nathanchere
  • 8,008
  • 15
  • 65
  • 86
Ravishankar N
  • 141
  • 1
  • 1
  • 9
  • 2
    The general advice is: don't parse (X)HTML with regex. For a good explanation, see [this question](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not). For the comedy explanation, see [this question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Stephen J. Anderson Jul 26 '13 at 07:28

1 Answers1

1

There is no way you can overcome this using regex, regular expression are not made for that, what you can do is to use some Xml Parser (since HTLM is Xml based), extract what you need and then do further manipulation thru regular expression and other tools.

For highlighting keywords, operation, special characters...etc, you can create a parser using some Grammar generating tool like GoldParser and then with the visitor pattern you can implement highlight and many other operation,

But remember Html is fairly complicated, that you will have a headache to make a grammar for it, and because of that I recommend you to use an existing Xml Parsing tool. search the net you will find many, choose the one that suite you need the best. take a look at the Html Agility Pack

Swift
  • 1,861
  • 14
  • 17
  • Parsing HTML with XML will almost certainly fail. That's what a DOM manipulator is for. Parsing XHTML on the other hand. That'll work. – Cole Tobin Jul 26 '13 at 08:07