-1

I downloaded some data from a site, and I got this string: Guangzhou R&F, as you can see within the string I have also amp; but the correct string (displayed on the site) is: Guangzhou R&F.

So I'm trying to remove that string using Regex, the expression I wrote is this:

public static string RemoveHtml(string input)
{
    return Regex.Replace(input, @"<[^>]+>|&nbsp;", "").Trim();
}

the problem is that the Regex expression doesn't remove the amp; what I did wrong?

Jidic
  • 147
  • 1
  • 8
  • 1
    & is the HTML encoding for an ampersand (&). It will render as &. – Stevo Aug 07 '18 at 09:41
  • Possible duplicate of [C#, function to replace all html special characters with normal text characters](https://stackoverflow.com/questions/2720684/c-function-to-replace-all-html-special-characters-with-normal-text-characters) – jao Aug 07 '18 at 09:43

1 Answers1

5

You don't need to manually replace or remove HTML, read about Character encodings in HTML

Here is the solution what you need:

System.Web.HttpUtility.HtmlDecode(input);
Yurii N.
  • 5,455
  • 12
  • 42
  • 66