1

I'm working on a small project and I got a little problem, hope you could help me.

I got this basic few lines that load a given url and takes out some tags:

var webGet2 = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = webGet2.Load(pattern);
var htmlMatches = doc.DocumentNode.SelectNodes("//li[@class=''] | //li[@class='f']");

After I'm receiving the collection, I need to run a foreach loop that can take all the href and src link and make them valid, because when I'm downloading the source, the link looks like /folder/folder/image.jpg I want to add http://www.site.com before each link.

I've build this project with Regex and had no problem doing that, but with HTML agility its not getting straight with my mind.

Thank you!

Iliya Reyzis
  • 3,618
  • 2
  • 19
  • 32
  • possible duplicate of [C# Convert Relative to Absolute Links in HTML String](http://stackoverflow.com/questions/3836644/c-sharp-convert-relative-to-absolute-links-in-html-string) – Ani Jul 31 '12 at 20:07

1 Answers1

4

So you want to search some nodes for certain attributes that contain relative urls and change them to absolute urls? You could do this:

static void AdjustAttributes(HtmlNode root, string baseUrl, string attrName)
{
    var query =
        from node in root.Descendants()
        let attr = node.Attributes[attrName]
        where attr != null
        select attr;
    foreach (var attr in query)
    {
        var url = GetAbsoluteUrlString(baseUrl, attr.Value);
        attr.Value = url;
    }
}

static string GetAbsoluteUrlString(string baseUrl, string url)
{
    var uri = new Uri(url, UriKind.RelativeOrAbsolute);
    if (!uri.IsAbsoluteUri)
        uri = new Uri(new Uri(baseUrl), uri);
    return uri.ToString();
}
var web = new HtmlWeb();
var doc = web.Load(pattern);
var selectedNodes = doc.DocumentNode.SelectNodes("//li[@class=''] | //li[@class='f']");
foreach (var node in selectedNodes)
{
    AdjustAttributes(node, url, "href");
    AdjustAttributes(node, url, "src");
}
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Hi Jeff, thank alot. but i need to fix the links after i pull out the **li**s i need the content, how can i do that? – Iliya Reyzis Jul 31 '12 at 21:39
  • Oh, I misinterpreted your question. I thought you were trying to get all those links. So your goal is to make all relative urls to absolute urls. Should be simple. – Jeff Mercado Jul 31 '12 at 21:50
  • i was trying to do as it shows here - [link](http://htmlagilitypack.codeplex.com/wikipage?title=Examples&referringTitle=Home) but im having problems with the **FixLink(att);** it keeps saying its not containing definition. SO i tried replacing it with **this.AbsoluteUrlByRelative(att.Value);** as i saw in other place, and it keep saying the same.. i know im missing something, but im not sure what :\ – Iliya Reyzis Jul 31 '12 at 21:57
  • Those methods do not exist anywhere in those examples. It's just a place holder for code that you're expected to provide that does the actual changes. – Jeff Mercado Jul 31 '12 at 22:26