1

Given the following...

HtmlNode myDiv = doc.DocumentNode.SelectSingleNode("//div[@id='someid']");

...where the resulting myDiv.InnerHtml contains:

<span>...other content I want to consume...</span>
<a href="http://www.somewhere.com" onmousedown="return somefunc('random','parm','values','SHXA213')">Click Me</a>
<span>...and more content I want to consume...</span>

Is there a way to not select the onmousedown portion of the anchor tag?

Solution
What I needed to do was the following:

HtmlNodeCollection anchors = myDiv.SelectNodes(@"//a[@class='someclass']");
anchors[0].SetAttributeValue("onmousedown", "");

// could have also used anchors[0].Attributes.Remove() or .RemoveAt()
Clay
  • 1,273
  • 2
  • 16
  • 23
  • Could you explain a little more about what you are trying to accomplish? Why is the onmousedown bad in your scenario? If your using the results of SelectSingleNode, you can ignore any child nodes/attributes that you don't care about. I have a feeling there is something I'm missing. – ckramer Jun 19 '10 at 00:28
  • Sure, I'm scraping some search results from Google. I need the InnerHtml contents pretty much as-is from a single div. The onmousedown throws javascript errors in the context of the site I'm consuming these results into. I figured SelectSingleNode would allow a way to ignore attributes but I have not been able to find the syntax yet. – Clay Jun 19 '10 at 00:34

1 Answers1

0

Is there a way to not select the onmousedown portion of the anchor tag?

No. Not with XPath (SelectSingleNode).

XPath is a query language and it cannot modify the nodes selected by an XPath expression. You need an additional language (DOM or XSLT) to change nodes (eg. strip off attributes).

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thank you for confirming. I found what I was looking for and will update my post accordingly. – Clay Jun 19 '10 at 00:57