12

Anyone created an open source C# parser for Web Links HTTP "Link" header? See:
https://www.rfc-editor.org/rfc/rfc5988.

Example:

Link: <http://example.com/TheBook/chapter2>; rel="previous"; title="previous chapter"

Thanks.

Update: Ended up creating my own parser: https://github.com/JornWildt/Ramone/blob/master/Ramone/Utility/WebLinkParser.cs. Feel free to use it.

Community
  • 1
  • 1
Jørn Wildt
  • 4,274
  • 1
  • 21
  • 31

3 Answers3

10

Ended up creating my own parser: https://github.com/JornWildt/Ramone/blob/master/Ramone/Utility/WebLinkParser.cs. Feel free to use it.

Jørn Wildt
  • 4,274
  • 1
  • 21
  • 31
0

Here's an extension method I've used:

public static Dictionary<string, string> ParseLinksHeader(
    this HttpResponseMessage response)
{
    var links = new Dictionary<string, string>();

    response.Headers.TryGetValues("link", out var headers);
    if (headers == null) return links;

    var matches = Regex.Matches(
        headers.First(),
        @"<(?<url>[^>]*)>;\s+rel=""(?<link>\w+)\""");
    
    foreach(Match m in matches)
        links.Add(m.Groups["link"].Value, m.Groups["url"].Value);

    return links;
}
Seth Reno
  • 5,350
  • 4
  • 41
  • 44
-2

Take the HTML Agility Pack and use the right

SelectNodes

query.

using HtmlAgilityPack;

namespace WebScraper
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlWeb web = new HtmlWeb();
            HtmlDocument doc =web.Load(url);
            foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@Link]"))
            {
            }
weismat
  • 7,195
  • 3
  • 43
  • 58
  • 3
    Thanks, but, no, that's not what I am asking for. I am referring to the HTTP protocol level "Link" header as described in http://tools.ietf.org/html/rfc5988. – Jørn Wildt Mar 16 '12 at 09:10
  • How are you getting your HTML documents? If it is a WebResponse, there is the Headers enumaration which should contain the LINK. – weismat Mar 16 '12 at 09:40
  • 2
    No problem getting the string from the HTTP headers. The problem is parsing the content of the string - splitting it into parts consisting of URL, rel-type, title and more. – Jørn Wildt Mar 16 '12 at 10:02