-1

I would like to replace words in a html string with another word, but it must only replace the exact word and not if it is part of the spelling of part of a word. The problem that I am having is that the html open or closing tags or other html elements are affecting what words are matched in the regex or it is replacing parts of words.

PostTxt = “<div>The <b>cat</b> sat on the mat, what a catastrophe.
 The <span>cat</span> is not allowed on the mat. This makes things complicated; the cat&nbsp must go! 
</div><p>cat cat cat</p>”; 

    string pattern = "cat";

    //replacement string to use
    string replacement = "******";

    //Replace words
    PostTxt = Regex.Replace(PostTxt, pattern, replacement, RegexOptions.IgnoreCase);
}

I would like it to return.

<div>The <b>***</b> sat on the mat, what a catastrophe. The <span>***</span> is not allowed on the mat. This makes things complicated; the ***&nbsp must go! </div><p>*** *** ***</p>

Any suggestions and help will be greatly appreciated.

Buckweed
  • 1
  • 1

1 Answers1

0

This is the simplified solution of the code I implemented using html-agility-pack.net. Regex is not a solution to this problem as noted by See: Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms. – Olivier Jacot-Descombes

PostTxt = "<div>The <b>cat</b> sat on the mat, what a catastrophe.
 The <span>cat</span> is not allowed on the mat. This makes things complicated; the cat must go! 
</div><p>Cat cat cat</p>"; 
                
HtmlDocument mainDoc = new HtmlDocument();
mainDoc.LoadHtml(PostTxt);

//replacement string to use
string replacement = “*****”;

string pattern = @"\b" + Regex.Escape("cat") + @"\b";

var nodes = mainDoc.DocumentNode.SelectNodes("//*") ?? new HtmlNodeCollection(null);

foreach (var node in nodes)
{
    node.InnerHtml = Regex.Replace(node.InnerHtml, pattern, replacement, RegexOptions.IgnoreCase);
}

PostTxt = mainDoc.DocumentNode.OuterHtml;
Buckweed
  • 1
  • 1