4

I have some text and at different positions of this text I have some HTML links such as <a href="link">text</a>.

I would like to convert it into [url=link]text[/url].

I know how to read the href, and the text alone, for instance:

var link = doc.SelectNodes("//a");
string link = link.Attributes["href"].value;
string text = link.InnerText;

but would could I replace it back into the text at the same place without hurting the text, missing the position, etc ?

Example:

The brown fox <a href="link">jumped over</a> the table while the rabbit <a href="link">scaped from it</a>.

Would become:

The brown fox [url=link]jumped over[/url] the table while the rabbit [url=link]scaped from it[/url].
Prix
  • 19,417
  • 15
  • 73
  • 132

1 Answers1

4

Something like this:

HtmlDocument doc = new HtmlDocument();
doc.Load(myTestFile);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    node.ParentNode.ReplaceChild(doc.CreateTextNode("[url=" + node.GetAttributeValue("href", null) +"]" + node.InnerHtml + "[/url]"), node);
}
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • if I wanted to do the same for images but taking into consideration that some images are inside links, would I need to spawn a new document once I replace the urls or is there a way to do both in 1 shot ? I've tried doing both in 1 shot but couldnt make it, perhaps there is a different xpath to deal with it. – Prix Aug 07 '12 at 16:31
  • For example if I used `//a[@href]|//img[@src]` it would work for the first image but not the second on the follow example: `text more text text ` – Prix Aug 07 '12 at 16:35
  • '//' means "all nodes from root", that's why it doesn't work. You can just duplicate the foreach loop with SelectNodes("//img[@src"). You can't easily select both types of elements in one SelectNodes shot. – Simon Mourier Aug 07 '12 at 16:39
  • great thx. That was actually my fault, I had the `a[@href]` as the first foreach which was why it wanst working as expected hehehe :P silly me – Prix Aug 07 '12 at 16:43