Questions tagged [html-agility-pack]

HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.

HTML Agility Pack is an open-source HTML parser that builds a read-and-write DOM and supports Linq, plain XPath or XSLT.

It is a .NET code library that allows parsing out of the web HTML files. The parser is very tolerant to malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents or streams.

Installing HTML Agility Pack can most easily be done using its NuGet package:

Install-Package HtmlAgilityPack

Latest stable release: 1.11.3 / 18 April 2019

GitHub page: https://github.com/zzzprojects/html-agility-pack

3466 questions
11
votes
6 answers

How to get img/src or a/hrefs using Html Agility Pack?

I want to use the HTML agility pack to parse image and href links from a HTML page,but I just don't know much about XML or XPath.Though having looking up help documents in many web sites,I just can't solve the problem.In addition,I use C# in…
iShow
  • 113
  • 1
  • 1
  • 5
11
votes
1 answer

Html Agility Pack: Find Comment Node

I am scraping a website that uses Javascript to dynamically populate the content of a website with the Html Agility pack. Basically, I was searching for the XPATH "\\div[@class='PricingInfo']", but that div node was being written to the DOM via…
Abe
  • 6,386
  • 12
  • 46
  • 75
11
votes
1 answer

Determine the htmlnode name/type (eg li)

How does one know if the HtmlNode I'm working with is an
  • ? I know the NodeType is an Element but how do you determine if that is an
  • Let me know if there's any more information you need.
  • shadonar
    • 1,114
    • 3
    • 16
    • 40
    11
    votes
    2 answers

    How can I write out decoded HTML using HTMLAgilityPack?

    I am having partial success in my attempt to write HTML to a DOCX file using HTMLAgilityPack and the DOCX library. However, the text I'm inserting into the .docx file contains encoded html such as: La ciudad de Los Ángeles (California) ha…
    B. Clay Shannon-B. Crow Raven
    • 8,547
    • 144
    • 472
    • 862
    11
    votes
    2 answers

    HTML Agility Pack

    I'm trying to use HTML Agility Pack to get the description text from inside the: And someone on Stackoverflow a little while ago suggested I use…
    jay_t55
    • 11,362
    • 28
    • 103
    • 174
    11
    votes
    3 answers

    How to pass cookies to HtmlAgilityPack or WebClient?

    I use this code to login: CookieCollection cookies = new CookieCollection(); HttpWebRequest request = (HttpWebRequest)WebRequest.Create("example.com"); request.CookieContainer = new…
    a1204773
    • 6,923
    • 20
    • 64
    • 94
    11
    votes
    6 answers

    Why can't I use htmlagilitypack with windows phone 8? What else can I use to Parse HTML in WP8?

    Why can't I use htmlagilitypack with windows phone 8? I appears to be supported on all platforms including Win8 Win8RT and WP7/WP7.5 and Silverlight 5. Is there one of the DLLS that would work? What else can I use to Parse HTML in WP8? All…
    user854534
    • 111
    • 1
    • 3
    10
    votes
    1 answer

    SelectNodes with XPath ignoring cases

    I have a problem finding elements in XPath that's contains a certain string ignoring character casing. I want to find in a HTML page all the nodes with id contains the text "footer" ignoring it's write in uppercase or lowercase. In my example I have…
    vfportero
    • 918
    • 1
    • 13
    • 26
    10
    votes
    4 answers

    Image tag not closing with HTMLAgilityPack

    Using the HTMLAgilityPack to write out a new image node, it seems to remove the closing tag of an image, e.g. should be but when you check outer html, has . string strIMG = "html-agility-pack
    mickyjtwin
    • 4,960
    • 13
    • 58
    • 77
    10
    votes
    1 answer

    HtmlAgilityPack giving exception "Multiple node elments can't be created."

    I have some input tags that are placeholders that I am replacing with some HTML. I am using below code to create html node below is the code snippet. But it is giving error as "Multiple node elements can't be created" when there are no multiple…
    10
    votes
    1 answer

    How to get title tag using HTML Agility Pack

    I'm parsing an HTML file using HTML Agility Pack. I want to get Some title <title> As you see, title doesn't have a class. So I couldn't catch it no matter what I have tried. I couldn't find the solution on the web either. How can I catch…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/c#" class="post-tag grid--cell" title="show questions tagged 'c#'" rel="tag">c#</a> <a href="../../questions/tagged/html" class="post-tag grid--cell" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/html-agility-pack" class="post-tag grid--cell" title="show questions tagged 'html-agility-pack'" rel="tag">html-agility-pack</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Nov 29 '16 at 07:51">asked Nov 29 '16 at 07:51</time> <a href="../../users/1954132/jason" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/1954132.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="jason" /> </a> <div class="s-user-card--info"> <a href="../../users/1954132/jason" class="s-user-card--link">jason</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">6,962</li> <li class="s-award-bling s-award-bling__gold" title="36 gold badges">36</li> <li class="s-award-bling s-award-bling__silver" title="117 silver badges">117</li> <li class="s-award-bling s-award-bling__bronze" title="198 bronze badges">198</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-3963251"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>10</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>4</strong> answers </div> </div> </div> <div class="summary"> <h3><a href="../../questions/3963251/html-agility-pack-help" class="question-hyperlink">Html Agility Pack help</a></h3> <div class="excerpt">I'm trying to scrape some information from a website but can't find a solution that works for me. Every code I read on the Internet generates at least one error for me. Even the example code at their homepage generates errors for me. My code: …</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/c#" class="post-tag grid--cell" title="show questions tagged 'c#'" rel="tag">c#</a> <a href="../../questions/tagged/html-agility-pack" class="post-tag grid--cell" title="show questions tagged 'html-agility-pack'" rel="tag">html-agility-pack</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Oct 18 '10 at 20:46">asked Oct 18 '10 at 20:46</time> <a href="../../users/360186/victor-bjelkholm" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/360186.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Victor Bjelkholm" /> </a> <div class="s-user-card--info"> <a href="../../users/360186/victor-bjelkholm" class="s-user-card--link">Victor Bjelkholm</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">2,177</li> <li class="s-award-bling s-award-bling__gold" title="9 gold badges">9</li> <li class="s-award-bling s-award-bling__silver" title="28 silver badges">28</li> <li class="s-award-bling s-award-bling__bronze" title="50 bronze badges">50</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-23298532"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>10</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/23298532/htmlagilitypack-and-authentication" class="question-hyperlink">HtmlAgilityPack and Authentication</a></h3> <div class="excerpt">I have a method to get ids and xpaths if given a particular url. How do I pass in the username and password with the request so that I can scrape a url that requires a username and password? using HtmlAgilityPack; _web = new HtmlWeb(); internal…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/c#" class="post-tag grid--cell" title="show questions tagged 'c#'" rel="tag">c#</a> <a href="../../questions/tagged/html-agility-pack" class="post-tag grid--cell" title="show questions tagged 'html-agility-pack'" rel="tag">html-agility-pack</a> <a href="../../questions/tagged/networkcredentials" class="post-tag grid--cell" title="show questions tagged 'networkcredentials'" rel="tag">networkcredentials</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Apr 25 '14 at 16:34">asked Apr 25 '14 at 16:34</time> <a href="../../users/3131062/jonathan-kittell" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/3131062.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Jonathan Kittell" /> </a> <div class="s-user-card--info"> <a href="../../users/3131062/jonathan-kittell" class="s-user-card--link">Jonathan Kittell</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">7,163</li> <li class="s-award-bling s-award-bling__gold" title="15 gold badges">15</li> <li class="s-award-bling s-award-bling__silver" title="50 silver badges">50</li> <li class="s-award-bling s-award-bling__bronze" title="93 bronze badges">93</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-22661640"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>10</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/22661640/how-to-fix-ill-formed-html-with-html-agility-pack" class="question-hyperlink">How to fix ill-formed HTML with HTML Agility Pack?</a></h3> <div class="excerpt">I have this ill-formed HTML with overlapping tags: <p>word1<b>word2</p> <p>word3</b>word4</p> The overlapping can be nested, too. How can I convert it into well-formed HTML with HTML Agility Pack (HAP)? I'm looking for this…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/c#" class="post-tag grid--cell" title="show questions tagged 'c#'" rel="tag">c#</a> <a href="../../questions/tagged/html" class="post-tag grid--cell" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/.net" class="post-tag grid--cell" title="show questions tagged '.net'" rel="tag">.net</a> <a href="../../questions/tagged/parsing" class="post-tag grid--cell" title="show questions tagged 'parsing'" rel="tag">parsing</a> <a href="../../questions/tagged/html-agility-pack" class="post-tag grid--cell" title="show questions tagged 'html-agility-pack'" rel="tag">html-agility-pack</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Mar 26 '14 at 12:51">asked Mar 26 '14 at 12:51</time> <a href="../../users/2674222/avo" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/2674222.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="avo" /> </a> <div class="s-user-card--info"> <a href="../../users/2674222/avo" class="s-user-card--link">avo</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">10,101</li> <li class="s-award-bling s-award-bling__gold" title="13 gold badges">13</li> <li class="s-award-bling s-award-bling__silver" title="53 silver badges">53</li> <li class="s-award-bling s-award-bling__bronze" title="81 bronze badges">81</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-18400619"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>10</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>2</strong> answers </div> </div> </div> <div class="summary"> <h3><a href="../../questions/18400619/html-agility-pack-new-htmlattribute" class="question-hyperlink">Html Agility Pack - New HtmlAttribute</a></h3> <div class="excerpt">Using Html Agility Pack in C# I have a node I'd like to add an attribute to. Currently the node is an <li> element with no attributes and I'd like to add a class to it of "active". It looks like the best thing to use would be…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/c#" class="post-tag grid--cell" title="show questions tagged 'c#'" rel="tag">c#</a> <a href="../../questions/tagged/html" class="post-tag grid--cell" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/html-agility-pack" class="post-tag grid--cell" title="show questions tagged 'html-agility-pack'" rel="tag">html-agility-pack</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Aug 23 '13 at 10:31">asked Aug 23 '13 at 10:31</time> <a href="../../users/1315865/tom-bowen" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/1315865.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Tom Bowen" /> </a> <div class="s-user-card--info"> <a href="../../users/1315865/tom-bowen" class="s-user-card--link">Tom Bowen</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">8,214</li> <li class="s-award-bling s-award-bling__gold" title="4 gold badges">4</li> <li class="s-award-bling s-award-bling__silver" title="22 silver badges">22</li> <li class="s-award-bling s-award-bling__bronze" title="42 bronze badges">42</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="s-pagination pager fr"> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=5" rel="prev" title="Go to page 5">Prev </a> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=1" rel="" title="Go to page 1">1</a> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=2" rel="" title="Go to page 2">2</a> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=3" rel="" title="Go to page 3">3</a> <div class="s-pagination--item s-pagination--item__clear">…</div> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=99" rel="" title="Go to page 99">99</a> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=100" rel="" title="Go to page 100">100</a> <a class="s-pagination--item" href="../../questions/tagged/html-agility-pack_page=7" rel="next" title="Go to page 7"> Next</a> </div> </div> </div> </div> </div> <script src="../../static/js/stack-icons.js"></script> <script src="../../static/js/fromnow.js"></script> </body> </html>