0

I have a html string with simple text:

<div><object height="315" width="560"></object><div><object height="315" width="560"></object></div></div>

How can I remove any occurence of <object> tag and anything inside it? So I want to replace it with empty string which means anything from <object> to </object> including those tags should be removed?

piet.t
  • 11,718
  • 21
  • 43
  • 52
sensei
  • 7,044
  • 10
  • 57
  • 125
  • 1
    One does not use regex to parse non-regular languages. I will just refer to this: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Alex Jun 28 '17 at 11:42
  • What have you tried so far? The basic pattern should be fairly simple. I didn't use regex in the last few years, but it should be an easy one – Zohar Peled Jun 28 '17 at 11:42
  • HTML Parsers parse html best: http://html-agility-pack.net – Alex K. Jun 28 '17 at 11:43

2 Answers2

1

Here you are:

var yourString = @"<div><object height=""315"" width=""560""></object><div><object height=""315"" width=""560""></object></div></div>";
yourString = Regex.Replace(yourString, @"<object.+?\/object>", String.Empty);
teo van kot
  • 12,350
  • 10
  • 38
  • 70
1

If you need to parse or modify HTML i recommend a real Html-Parser like HtmlAgilityPack:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
foreach (var node in doc.DocumentNode.SelectNodes("//object"))
    node.Remove();

// if you need it as string:
var writer = new StringWriter();
doc.Save(writer);
html = writer.ToString();

The result is (also the nested divs are removed as desired):

<div><div></div></div>
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939