1

I am currently working with an XML document which has RSS feeds inside. And I wanted to parse it so that if a div tag with a class name "feedflare" is found, the code would remove the whole DIV.

I could not find an example of doing this as the search for it is polluted with "HTML editor errors" and other irrelevant data.

Would anyone here be kind enough to share methods in reaching my goal?

I must state that I DO NOT want to use HtmlAgilityPack if I can avoid it.

This is my process:

Load XML, parse through elements and pick out, Title, Description, Link. Then save all this as HTML (with tags being added programatically to build a web page) and then when all of the tags are added, I want to parse the resulting "HTML text" and remove the annoying DIV tag.

Let's assume "string HTML = textBox1.text" where textBox1 is where the resulting HTML is pasted, after parsing the main XML document.

How would I then loop through the contents of textBox1.text and remove ONLY the div tag called "feedflare" (see below).

<div class="feedflare">
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:yIl2AUoC8zA">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=yIl2AUoC8zA" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:H0mrP-F8Qgo">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=H0mrP-F8Qgo" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU" border="0"></img></a>
</div>

Thank you in advance.

Chuck Savage
  • 11,775
  • 6
  • 49
  • 69
Meh
  • 607
  • 1
  • 9
  • 19

3 Answers3

0

Using this xml library, do:

XElement root = XElement.Load(file); // or .Parse(string);
XElement div = root.XPathElement("//div[@class={0}]", "feedflare");
div.Remove();
root.Save(file); // or string = root.ToString();
Chuck Savage
  • 11,775
  • 6
  • 49
  • 69
  • Thank you for the reply. However, I am getting a NULL exception (System.NullReferenceException was unhandled Message=Object reference not set to an instance of an object.) running this code: `code1 XElement root = XElement.Parse(textBox1.Text); XElement div = root.XPathElement("//div[@class={0}]", "feedflare"); div.Remove(); <--- Exception thrown here string test = root.ToString(); MessageBox.Show(test);`code1 – Meh Jun 22 '12 at 14:30
  • 1
    Where can I get your full xml to test it? – Chuck Savage Jun 22 '12 at 15:43
0

try this

   System.Xml.XmlDocument d = new System.Xml.XmlDocument();
   d.LoadXml(Your_XML_as_String);
    foreach(System.Xml.XmlNode n in d.GetElementsByTagName("div"))
   d.RemoveChild(n);

and use d.OuterXml to retrieve the new xml.

Amged
  • 670
  • 1
  • 7
  • 19
0

My solution in Javascript is:

function unrichText(texto) {
  var n = texto.indexOf("\">"); //Finding end of "<div&nbsp;class="ExternalClass...">
  var sub = texto.substring(0, n+2); //Adding first char and last two (">)
  var tmp = texto.replace(sub, ""); //Removing it
  tmp = replaceAll(tmp, "</div>", ""); //Removing last "div"
  tmp = replaceAll(tmp, "<p>", ""); //Removing other stuff
  tmp = replaceAll(tmp, "</p>", "");
  tmp = replaceAll(tmp, "&#160;", "");
  return (tmp);
}

function replaceAll(str, find, replace) {
    return str.replace(new RegExp(find, 'g'), replace);
}