Questions tagged [html-agility-pack]

HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.

HTML Agility Pack is an open-source HTML parser that builds a read-and-write DOM and supports Linq, plain XPath or XSLT.

It is a .NET code library that allows parsing out of the web HTML files. The parser is very tolerant to malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents or streams.

Installing HTML Agility Pack can most easily be done using its NuGet package:

Install-Package HtmlAgilityPack

Latest stable release: 1.11.3 / 18 April 2019

GitHub page: https://github.com/zzzprojects/html-agility-pack

3466 questions
17
votes
4 answers

htmlagilitypack - remove script and style?

Im using the following method to extract text form html: public string getAllText(string _html) { string _allText = ""; try { HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument(); …
Jacqueline
  • 481
  • 2
  • 11
  • 20
16
votes
4 answers

HtmlAgilityPack Documentation

I am new to C#(started today) and I am trying to understand someone else's code which used the HtmlDocument class in HtmlAgilliyPack to parse HTML documents. I cannot find any documentation of this package. The HtmlAgilityPack's project webpage says…
Bob
  • 561
  • 1
  • 6
  • 18
16
votes
3 answers

Parsing HTML Table in C#

I have an html page which contains a table and i want to parse that table in C# windows form http://www.mufap.com.pk/payout-report.php?tab=01 this is the webpage i want to parse i have tried > Foreach(Htmlnode a in…
user1764351
  • 225
  • 1
  • 3
  • 12
16
votes
2 answers

Running Scripts in HtmlAgilityPack

I'm trying to scrape a particular webpage which works as follows. First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data. If I Get the page with HtmlAgilityPack - the…
Aabela
  • 1,408
  • 5
  • 19
  • 28
16
votes
3 answers

htmlagilitypack and dynamic content issue

I want to create a web __scraper__ application and i want to do it with webbrowser control, htmlagilitypack and xpath. right now i managed to create xpath generator(I used webbrowser for this purpose), which works fine, but sometimes I cannot grab…
Chyngyz Sydykov
  • 430
  • 2
  • 6
  • 18
15
votes
1 answer

How to create an html document from scratch using the HtmlAgility pack

I just wanted to create my own simple document using the agility pack so create a new HtmlDocument that contains just the basic container elements - i.e. How can I do this from scratch without actually…
Pittfall
  • 2,751
  • 6
  • 32
  • 61
15
votes
2 answers

HTML Agility Pack - How can append element at the top of Head element?

I'm trying to use HTML Agility Pack to append a script element into the top of the HEAD section of my html. The examples I have seen so far just use the AppendChild(element) method to accomplish this. I need the script that I am appending to the…
Nick
  • 19,198
  • 51
  • 185
  • 312
15
votes
4 answers

HtmlAgilityPack: how to create indented HTML?

So, I am generating html using HtmlAgilityPack and it's working perfectly, but html text is not indented. I can get indented XML however, but I need HTML. Is there a way? HtmlDocument doc = new HtmlDocument(); // gen html HtmlNode table =…
Petr Abdulin
  • 33,883
  • 9
  • 62
  • 96
15
votes
4 answers

HtmlAgilityPack & Selenium Webdriver returns random results

I'm trying to scrape product names from a website. Oddly, I seem to only scrape random 12 items. I've tried both HtmlAgilityPack and with HTTPClient and I get the same random results. Here's my code for HtmlAgilityPack: using…
15
votes
3 answers

ItextSharp Error on trying to parse html for pdf conversion

I was using the ItextSharp module to convert the below listed html in to a pdf page.
mma
mmar


Click to View Pricing
karry
  • 3,270
  • 3
  • 18
  • 31
14
votes
5 answers

HTML Agility Pack Null Reference

I've got some trouble with the HTML Agility Pack. I get a null reference exception when I use this method on HTML not containing the specific node. It worked at first, but then it stopped working. This is only a snippet and there are about 10 more…
tohereknowswhen
  • 199
  • 2
  • 3
  • 9
14
votes
6 answers

Using BrowserSession and HtmlAgilityPack to login to Facebook through .NET

I'm trying to use Rohit Agarwal's BrowserSession class together with HtmlAgilityPack to login to and subsequently navigate around Facebook. I've previously managed doing the same by writing my own HttpWebRequest's. However, it then only works when I…
Karsa Olong
  • 143
  • 1
  • 1
  • 4
14
votes
3 answers

Html Agility Pack: make code look neat

Can I use Html Agility Pack to make the output look nicely indented, unnecessary white space stripped?
Jan
  • 6,532
  • 9
  • 37
  • 48
14
votes
1 answer

Select elements with attribute data-url using HTMLAgilityPack

I'm writing a little Download-Roboter, that is searching for links in lower layers for it self. What i need to find are all links in an html-Page (the links to .jpg files as well as the links to .pgn, .pdf, .html,.... - files) I´m using the…
Joe Black
  • 155
  • 1
  • 1
  • 6
13
votes
2 answers

HTMLAgilityPack SelectNodes to select all elements

I am making a project in C# that's basically an image screen scraper for an image-search related game. I'm trying to use HTMLAgilityPack to select all the image elements and put them in an HTMLNodeCollection, like this: //set up for checking…
Joe Sadoski
  • 586
  • 2
  • 7
  • 22