Removing DIV from a text file if it contains a certain classname

Question

I am currently working with an XML document which has RSS feeds inside. And I wanted to parse it so that if a div tag with a class name "feedflare" is found, the code would remove the whole DIV.

I could not find an example of doing this as the search for it is polluted with "HTML editor errors" and other irrelevant data.

Would anyone here be kind enough to share methods in reaching my goal?

I must state that I DO NOT want to use HtmlAgilityPack if I can avoid it.

This is my process:

Load XML, parse through elements and pick out, Title, Description, Link. Then save all this as HTML (with tags being added programatically to build a web page) and then when all of the tags are added, I want to parse the resulting "HTML text" and remove the annoying DIV tag.

Let's assume "string HTML = textBox1.text" where textBox1 is where the resulting HTML is pasted, after parsing the main XML document.

How would I then loop through the contents of textBox1.text and remove ONLY the div tag called "feedflare" (see below).

<div class="feedflare">
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:yIl2AUoC8zA">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=yIl2AUoC8zA" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:H0mrP-F8Qgo">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=H0mrP-F8Qgo" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU" border="0"></img></a>
</div>

Thank you in advance.

do you want to remove only `div` tag or everything between `
` and `
`? — Harry89pl, Jun 22 '12 at 13:33
@harry180 If you read first paragraph of post, it says `the code would remove the whole DIV` — Chuck Savage, Jun 22 '12 at 15:48
It would probably be helpful to explain why you don't want to use the HtmlAgilityPack. It would also be helpful to have a complete example. — NotMe, Jul 11 '12 at 00:52

score 0 · Accepted Answer · answered Jun 22 '12 at 13:35

0

Using this xml library, do:

XElement root = XElement.Load(file); // or .Parse(string);
XElement div = root.XPathElement("//div[@class={0}]", "feedflare");
div.Remove();
root.Save(file); // or string = root.ToString();

answered Jun 22 '12 at 13:35

Chuck Savage

11,775
6
49
69

Thank you for the reply. However, I am getting a NULL exception (System.NullReferenceException was unhandled Message=Object reference not set to an instance of an object.) running this code: `code1 XElement root = XElement.Parse(textBox1.Text); XElement div = root.XPathElement("//div[@class={0}]", "feedflare"); div.Remove(); <--- Exception thrown here string test = root.ToString(); MessageBox.Show(test);`code1 – Meh Jun 22 '12 at 14:30
1

Where can I get your full xml to test it? – Chuck Savage Jun 22 '12 at 15:43

score 0 · Answer 2 · answered Jul 11 '12 at 00:42

0

try this

   System.Xml.XmlDocument d = new System.Xml.XmlDocument();
   d.LoadXml(Your_XML_as_String);
    foreach(System.Xml.XmlNode n in d.GetElementsByTagName("div"))
   d.RemoveChild(n);

and use d.OuterXml to retrieve the new xml.

answered Jul 11 '12 at 00:42

Amged

670
1
7
19

score 0 · Answer 3 · answered Apr 05 '19 at 15:36

My solution in Javascript is:

function unrichText(texto) {
  var n = texto.indexOf("\">"); //Finding end of "<div&nbsp;class="ExternalClass...">
  var sub = texto.substring(0, n+2); //Adding first char and last two (">)
  var tmp = texto.replace(sub, ""); //Removing it
  tmp = replaceAll(tmp, "</div>", ""); //Removing last "div"
  tmp = replaceAll(tmp, "<p>", ""); //Removing other stuff
  tmp = replaceAll(tmp, "</p>", "");
  tmp = replaceAll(tmp, "&#160;", "");
  return (tmp);
}

function replaceAll(str, find, replace) {
    return str.replace(new RegExp(find, 'g'), replace);
}

Removing DIV from a text file if it contains a certain classname

3 Answers3