Questions tagged [html-agility-pack]

HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.

HTML Agility Pack is an open-source HTML parser that builds a read-and-write DOM and supports Linq, plain XPath or XSLT.

It is a .NET code library that allows parsing out of the web HTML files. The parser is very tolerant to malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents or streams.

Installing HTML Agility Pack can most easily be done using its NuGet package:

Install-Package HtmlAgilityPack

Latest stable release: 1.11.3 / 18 April 2019

GitHub page: https://github.com/zzzprojects/html-agility-pack

3466 questions
13
votes
1 answer

XPath Select all children with specific parent node by attribute

I want to select all children i.e images whose parent div with id is testRoot. The structure is unknown. I have simplified it here for understanding purpose. If it is XPath expression, that will be great.
Idrees Khan
  • 7,702
  • 18
  • 63
  • 111
13
votes
1 answer

Select only items in a specific DIV using HtmlAgilityPack

I'm trying to use the HtmlAgilityPack to pull all of the links from a page that are contained within a div declared as
However, when I use the code below I simply get ALL links on the entire page. This doesn't really make…
Adam Haile
  • 30,705
  • 58
  • 191
  • 286
13
votes
2 answers

Get Links in class with html agility pack

There are a bunch of tr's with the class alt. I want to get all the links (or the first of last) yet i cant figure out how with html agility pack. I tried variants of a but i only get all the links or none. It doesnt seem to only get the one in the…
user34537
13
votes
1 answer

Parsing HTML to get script variable value

I'm trying to find a method of accessing data between tags returned by a server I am making HTTP requests to. The document has multiple tags, but only one of the tags has JavaScript code between it, the rest are included from files. I want to…
James Jeffery
  • 12,093
  • 19
  • 74
  • 108
13
votes
1 answer

HTMLagilitypack is not removing all html tags How can I solve this efficiently?

I am using following method to strip all html from the string: public static string StripHtmlTags(string html) { if (String.IsNullOrEmpty(html)) return ""; HtmlAgilityPack.HtmlDocument doc = new…
Obsivus
  • 8,231
  • 13
  • 52
  • 97
13
votes
4 answers

remove html node from htmldocument :HTMLAgilityPack

In my code, I want to remove the img tag which doesn't have src value. I am using HTMLAgilitypack's HtmlDocument object. I am finding the img which doesn't have src value and trying to remove it.. but it gives me error Collection was…
Priya
  • 1,375
  • 8
  • 21
  • 45
12
votes
1 answer

HtmlAgilityPack - get all nodes in a document

i would like to traverse all nodes in a documnet using HtmlAgilityPack will foreach (HtmlNode node in myhtml.DocumentNode.SelectNodes("//@")) do?
kiki
  • 323
  • 2
  • 5
  • 11
12
votes
2 answers

Extracting Inner text from HTML BODY node with Html Agility Pack

Need a bit of help with HTML Agility Pack! Basically I want to grab plain-text withing the body node of the HTML. So far I have tried this in vb.net and it fails to return the innertext meaning no change is seen, well atleast from what I can…
KJSR
  • 1,679
  • 6
  • 28
  • 51
12
votes
1 answer

HTML Agility pack create new HTMLNode

I'm using HTML Agility Pack to parse and transform a HTML file, but I get an exception "Item has already been added" when try to create a new HTMLNode because of the index parameter. HtmlNode node1 = new HtmlNode(HtmlNodeType.Element, doc, 0);…
Diogo Cardoso
  • 21,637
  • 26
  • 100
  • 138
12
votes
5 answers

Parsing html with the HTML Agility Pack and Linq

I have the following HTML (..) Test1 Data Data 2 Test2 Data2 Data 2…
Timo Willemsen
  • 8,717
  • 9
  • 51
  • 82
12
votes
2 answers

HTML Agility Pack - using XPath to get a single node - Object Reference not set to an instance of an object

this is my first attempt to get an element value using HAP. I'm getting a null object error when I try to use InnerText. the URL I am scraping is :- http://www.mypivots.com/dailynotes/symbol/659/-1/e-mini-sp500-june-2013 I am trying to get the…
dontpanic
  • 277
  • 3
  • 5
  • 10
12
votes
3 answers

login to website using HTMLAgilityPack

In the below code, I can set the value of the username and password using the HTMLAgilitypack but I cannot invoke the click event of the login button (the id in the source code of the button is "s1"). Is there anyway for this to be done? The reason…
touyets
  • 1,315
  • 6
  • 19
  • 34
11
votes
3 answers

Get a value of an attribute by XPath and HtmlAgilityPack

I have a HTML document and I parse it with XPath. I want to get a value of the element input, but it didn't work. My Html:
Chani Poz
  • 1,413
  • 2
  • 21
  • 46
11
votes
2 answers

How to strip comments from HTML using Agility Pack without losing DOCTYPE

I am trying to remove unnecessary content from HTML. Specifically I want to remove comments. I found a pretty good solution (Grabbing meta-tags and comments using HTML Agility Pack) however the DOCTYPE is treated as a comment and therefore removed…
desautelsj
  • 3,587
  • 4
  • 37
  • 55
11
votes
7 answers

Selecting attribute values with html Agility Pack

I'm trying to retrieve a specific image from a html document, using html agility pack and this xpath: //div[@id='topslot']/a/img/@src As far as I can see, it finds the src-attribute, but it returns the img-tag. Why is that? I would expect the…
Vegar
  • 12,828
  • 16
  • 85
  • 151