Regex for removing a html tag

Question

I have a html string with simple text:

<div><object height="315" width="560"></object><div><object height="315" width="560"></object></div></div>

How can I remove any occurence of <object> tag and anything inside it? So I want to replace it with empty string which means anything from <object> to </object> including those tags should be removed?

One does not use regex to parse non-regular languages. I will just refer to this: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Alex, Jun 28 '17 at 11:42
What have you tried so far? The basic pattern should be fairly simple. I didn't use regex in the last few years, but it should be an easy one — Zohar Peled, Jun 28 '17 at 11:42

teo van kot · Answer 1 · 2017-06-28T11:54:45.107

1

Here you are:

var yourString = @"<div><object height=""315"" width=""560""></object><div><object height=""315"" width=""560""></object></div></div>";
yourString = Regex.Replace(yourString, @"<object.+?\/object>", String.Empty);

edited Jun 28 '17 at 11:54

answered Jun 28 '17 at 11:47

teo van kot

12,350
10
38
70

score 1 · Answer 2 · answered Jun 28 '17 at 11:51

If you need to parse or modify HTML i recommend a real Html-Parser like HtmlAgilityPack:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
foreach (var node in doc.DocumentNode.SelectNodes("//object"))
    node.Remove();

// if you need it as string:
var writer = new StringWriter();
doc.Save(writer);
html = writer.ToString();

The result is (also the nested divs are removed as desired):

<div><div></div></div>

2 Answers2