Questions tagged [html-agility-pack]

HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.

HTML Agility Pack is an open-source HTML parser that builds a read-and-write DOM and supports Linq, plain XPath or XSLT.

It is a .NET code library that allows parsing out of the web HTML files. The parser is very tolerant to malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents or streams.

Installing HTML Agility Pack can most easily be done using its NuGet package:

Install-Package HtmlAgilityPack

Latest stable release: 1.11.3 / 18 April 2019

GitHub page: https://github.com/zzzprojects/html-agility-pack

3466 questions
22
votes
3 answers

HTML Linq with HtmlAgilityPack, or alternative, in PCL

I have written a project on .NET 4 and am currently in the process of allowing it to run on Windows Phone as well. I am using HtmlAgilityPack, a well known library which allows Linq queries over HTML, and am only using the LoadHtml and Linq…
21
votes
3 answers

how to access child node from node in htmlagility pack

      I loaded the…
      Ajit
      • 309
      • 1
      • 3
      • 8
      21
      votes
      1 answer

      HTML Agility Pack get all anchors' href attributes on page

      I am trying to add links extracted from an HTML file to a CheckBoxList (cbl_items). It works so far but instead of the link, the item's name is displayed as HtmlAgilityPack.HtmlNode. I tried using DocumentElement instead of Node but it said that it…
      user3802921
      • 241
      • 1
      • 2
      • 9
      20
      votes
      1 answer

      HTML Agility Pack HtmlDocument Show All Html?

      I am using the following to get a web page which works fine public static HtmlDocument GetWebPageFromUrl(string url) { var hw = new HtmlWeb(); return hw.Load(url); } But how to I spit the entire contents of the HTML out…
      YodasMyDad
      • 9,248
      • 24
      • 76
      • 121
      20
      votes
      1 answer

      Html Agility Pack SelectSingleNode giving always same result in iteration?

      I would like the nodes in the collection but with iterating SelectSingleNode I keep getting the same object just node.Id is changing... What i try is to readout the webresponse of a given site and catch some information like values, links .. in…
      Mikatsu
      • 530
      • 2
      • 4
      • 15
      19
      votes
      2 answers

      How do I use HTML Agility Pack to edit an HTML snippet

      So I have an HTML snippet that I want to modify using C#.
      This is a specialSearchWord that I want to link to A hyperlink Some more text and that specialSearchWord again.
      and I want to…
      John
      • 3,332
      • 5
      • 33
      • 55
      19
      votes
      3 answers

      HtmlAgilityPack Post Login

      I'm trying to login to a site using HtmlAgilityPack (site:http://html-agility-pack.net). Now, I can't exactly figure out how to go about this. I've tried setting the Html form values…
      Styles
      • 515
      • 2
      • 6
      • 16
      19
      votes
      4 answers

      How can I get html from page with cloudflare ddos portection?

      I use htmlagility to get webpage data but I tried everything with page using www.cloudflare.com protection for ddos. The redirect page is not possible to handle in htmlagility because they don't redirect with meta nor js I guess, they check if you…
      ItalianOne
      • 221
      • 1
      • 2
      • 8
      19
      votes
      5 answers

      Html Agility Pack, SelectNodes from a node

      Why does this pick all of my
    • elements in my document? HtmlWeb web = new HtmlWeb(); HtmlDocument doc = web.Load(url); var travelList = new List(); var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[@id='myTrips']") …
    • thatsIT
      • 2,085
      • 6
      • 29
      • 43
      18
      votes
      2 answers

      Alternatives to HtmlAgilityPack?

      I don't like some of the design decisions made in HtmlAgilityPack: When using SelectNodes, if no nodes are found, it returns null rather than an empty set, so you can't just foreach over it without a null check. When trying to select children with…
      mpen
      • 272,448
      • 266
      • 850
      • 1,236
      18
      votes
      5 answers

      Stripping all html tags with Html Agility Pack

      I have a html string like this:

      foo bar baz

      I wish to strip all html tags so that the resulting string becomes: foo bar baz From another post here at SO I've come up with this…
      Muleskinner
      • 14,150
      • 19
      • 58
      • 79
      18
      votes
      5 answers

      Get a value of an attribute by HtmlAgilityPack

      I want to get a value of an attribute by HtmlAgilityPack. Html code:
      denied
      • 311
      • 1
      • 5
      • 18
      18
      votes
      3 answers

      GetElementsByTagName in Htmlagilitypack

      How do I select an element for e.g. textbox if I don't know its id? If I know its id then I can simply write: HtmlAgilityPack.HtmlNode node = doc.GetElementbyId(id); But I don't know textbox's ID and I can't find GetElementsByTagName method in…
      Ali
      • 1,801
      • 6
      • 43
      • 58
      17
      votes
      4 answers

      HTML Agility pack removes break tag close

      I am creating an HTML document using HTML agility pack. I load a template file then append content to it. All of this works, but when I view the output file it has removed the closing tag from my
      tags to look like this
      . What is causing…
      FarFigNewton
      • 7,108
      • 13
      • 50
      • 77
      17
      votes
      2 answers

      Htmlagilitypack: create html text node

      In HtmlAgilityPack, I want to create HtmlTextNode, which is a HtmlNode (inherts from HtmlNode) that has a custom InnerText. HtmlTextNode CreateHtmlTextNode(string name, string text) { HtmlDocument doc = new HtmlDocument(); HtmlTextNode…
      Nizar Blond
      • 1,826
      • 5
      • 20
      • 42