Questions tagged [html-agility-pack]

HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.

HTML Agility Pack is an open-source HTML parser that builds a read-and-write DOM and supports Linq, plain XPath or XSLT.

It is a .NET code library that allows parsing out of the web HTML files. The parser is very tolerant to malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents or streams.

Installing HTML Agility Pack can most easily be done using its NuGet package:

Install-Package HtmlAgilityPack

Latest stable release: 1.11.3 / 18 April 2019

GitHub page: https://github.com/zzzprojects/html-agility-pack

3466 questions
34
votes
2 answers

HtmlAgilityPack -- Does
close itself for some reason?

I just wrote up this test to see if I was crazy... using System; using System.Collections.Generic; using System.Linq; using System.Text; using HtmlAgilityPack; namespace HtmlAgilityPackFormBug { class Program { static void…
mpen
  • 272,448
  • 266
  • 850
  • 1,236
33
votes
1 answer

Html Agility Pack. Load and scrape webpage

Is this the best way to get a webpage when scraping? HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url); HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse(); var doc = new…
thatsIT
  • 2,085
  • 6
  • 29
  • 43
32
votes
3 answers

HtmlAgilityPack set node InnerText

I want to replace inner text of HTML tags with another text. I am using HtmlAgilityPack I use this code to extract all texts HtmlDocument doc = new HtmlDocument(); doc.Load("some path") foreach (HtmlNode node in…
Shahin
  • 12,543
  • 39
  • 127
  • 205
30
votes
2 answers

HtmlAgilityPack replace node

I want to replace a node with a new node. How can I get the exact position of the node and do a complete replace? I've tried the following, but I can't figured out how to get the index of the node or which parent node to call ReplaceChild()…
Omar
  • 39,496
  • 45
  • 145
  • 213
30
votes
2 answers

Html Agility Pack - Problem selecting subnode

I want to export my Asics running plan to iCal and since Asics do not offer this service, I decided to build a little scraper for my own personal use. What I want to do is to take all the scheduled runs from my plan and generate an iCal feed based…
Sebastian Brandes
  • 726
  • 1
  • 6
  • 15
30
votes
5 answers

C# html agility pack get elements by class name

I'm trying to get all the divs that their class contains a certain word:
content1
content3
I need to get all the divs that their class contains the word…
Ofer Gozlan
  • 953
  • 2
  • 9
  • 21
30
votes
2 answers

HTML Agility Pack strip tags NOT IN whitelist

I'm trying to create a function which removes html tags and attributes which are not in a white list. I have the following HTML: first text second text here some text here some text here some twxt…
Dragos Durlut
  • 8,018
  • 10
  • 47
  • 62
28
votes
2 answers

How can I use HTML Agility Pack to retrieve all the images from a website?

I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples. I'm looking for a way to download all the images from a website. The address strings, not the physical image. I need to pull the…
Sergio Tapia
  • 40,006
  • 76
  • 183
  • 254
25
votes
3 answers

Loading from string instead of document/url

I just found out about html agility pack and I tried it, but stumbled upon a problem. I couldn't find anything on the web so I am trying here. Do you know how I can load the HTML from a string instead of document/URL? Thanks.
Darko
  • 535
  • 1
  • 6
  • 15
25
votes
2 answers

Remove attributes using HtmlAgilityPack

I'm trying to create a code snippet to remove all style attributes regardless of tag using HtmlAgilityPack. Here's my code: var elements = htmlDoc.DocumentNode.SelectNodes("//*"); if (elements!=null) { foreach (var element in elements) { …
Ted Nyberg
  • 7,001
  • 7
  • 41
  • 72
25
votes
2 answers

HtmlAgilityPack : illegal characters in path

I'm getting an "illegal characters in path" error in this code. I've mentioned "Error Occuring Here" as a comment in the line where the error is occuring. var document = htmlWeb.Load(searchUrl); var hotels = document.DocumentNode.Descendants("div") …
Pranab
  • 382
  • 3
  • 10
24
votes
9 answers

C# and HtmlAgilityPack encoding problem

WebClient GodLikeClient = new WebClient(); HtmlAgilityPack.HtmlDocument GodLikeHTML = new HtmlAgilityPack.HtmlDocument(); GodLikeHTML.Load(GodLikeClient.OpenRead("www.alfa.lt"); So this code returns: "Skaitytojo klausimas psichologui: kas lemia…
August
  • 490
  • 3
  • 5
  • 18
24
votes
2 answers

HtmlAgilityPack Drops Option End Tags

I am using HtmlAgilityPack. I create an HtmlDocument and LoadHtml with the following string: This does some unexpected…
Tim Scott
  • 15,106
  • 9
  • 65
  • 79
23
votes
2 answers

HTML Agility pack: parsing an href tag

How would I effectively parse the href attribute value from this : 7 D. Kulikov
Jean-François Beaulieu
  • 4,305
  • 22
  • 74
  • 107