Questions tagged [html-parsing]

HTML parsing is the process of consuming a serialization of an HTML document and producing a representation that you can work with programmatically — e.g., in order to extract data from it. The HTML specification defines a standard algorithm for parsing HTML, which is implemented in all major browsers.

HTML parsing typically involves converting an HTML document to a tree-based Document Object Model (DOM)

https://html.spec.whatwg.org/multipage/parsing.html#parsing has the standard algorithm for parsing HTML, which is implemented in all major browsers.

Problem with HTML Parser in IE

I am trying to create a dialog box that will appear only if the browser selected is IE (any version) however I get this error: Message: HTML Parsing Error: Unable to modify the parent container element before the child element is closed…

javascript html-parsing

asked Nov 19 '08 at 10:31

Tsundoku

9,104
29
93
127

votes

3 answers

HtmlAgilityPack set node InnerText

I want to replace inner text of HTML tags with another text. I am using HtmlAgilityPack I use this code to extract all texts HtmlDocument doc = new HtmlDocument(); doc.Load("some path") foreach (HtmlNode node in…

c# parsing html-parsing html-agility-pack

asked Nov 25 '11 at 21:34

Shahin

12,543
39
127
205

votes

5 answers

Android HTML ImageGetter as AsyncTask

Okay, I'm losing my mind over this one. I have a method in my program which parses HTML. I want to include the inline images, and I am under the impression that using the Html.fromHtml(string, Html.ImageGetter, Html.TagHandler) will allow this to…

android html-parsing drawable android-asynctask

asked Sep 15 '11 at 00:16

Nick

6,900
5
45
66

votes

12 answers

jQuery-like interface for PHP?

I was curious as to whether or not there exists a jQuery-style interface/library for PHP for handling HTML/XML files -- specifically using jQuery style selectors. I'd like to do things like this (all hypothetical): foreach (j("div > p > a") as…

php jquery html xml html-parsing

asked Sep 01 '09 at 19:11

theotherlight

votes

2 answers

HTML Agility Pack strip tags NOT IN whitelist

I'm trying to create a function which removes html tags and attributes which are not in a white list. I have the following HTML: first text second text here some text here some text here some twxt…

c# tags html-parsing html-agility-pack sanitize

asked Jun 24 '10 at 05:52

Dragos Durlut

8,018
10
47
62

votes

1 answer

HtmlAgility - Save parsing to a string

Just tried using the HtmlAgility Pack for the first time and have a problem. First I load in from a string variable. string NewsText = dr["Message"].ToString(); HtmlAgilityPack.HtmlDocument htmlDoc = new…

c# parsing html-parsing

asked Feb 24 '11 at 16:15

larschanders

1,951
3
16
21

votes

3 answers

Python BeautifulSoup scrape tables

I am trying to create a table scrape with BeautifulSoup. I wrote this Python code: import urllib2 from bs4 import BeautifulSoup url = "http://dofollow.netsons.org/table1.htm" # change to whatever your url is page =…

python html web-scraping beautifulsoup html-parsing

asked Sep 23 '13 at 18:35

kingcope

1,121
4
19
36

votes

4 answers

How can I add "current streak" of contributions from github to my blog?

I have a personal blog I built using rails. I want to add a section to my site that displays my current streak of github contributions. What would be the best way about doing this? edit: for clarification, here is what I want: just the number of…

html ruby-on-rails ruby github html-parsing

asked Apr 12 '13 at 18:55

Ox Smith

votes

1 answer

Get text content of an HTML element using XPath?

See this html

Monitor $300

Add to cart

Keyboard $20

Add to cart

Using xpath…

html xml xpath html-parsing

asked Jan 31 '13 at 17:25

Genghis Khan

votes

3 answers

How do I convert a document made in Jsoup (the Java html parser) into a string

I have a document that was made in jsoup that looks like this Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); How do i convert that doc into a string.

java html-parsing jsoup html-parser

asked Jul 28 '11 at 20:13

Hudson Hughes

votes

9 answers

Is it possible to get data from HTML forms into android while using webView?

I'm making a very simple form in HTML which is viewed in android using the webview which takes in your name using a textbox and when you click on the button, it displays it into a paragraph and it's made using both html and javascript. This is my…

javascript android webview html-parsing code-injection

asked Nov 30 '16 at 13:05

Shariq Musharaf

votes

1 answer

Differences between .text and .get_text()

In BeautifulSoup, is there any difference between .text and .get_text()? Which one should be preferred for getting element's text? >>> from bs4 import BeautifulSoup >>> >>> html = "

text1 text2

" >>> soup = BeautifulSoup(html,…

python html beautifulsoup html-parsing

asked Feb 19 '16 at 02:37

alecxe

462,703
120
1,088
1,195

votes

6 answers

Parsing HTML in Python

What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated. I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure…

python html-parsing

asked Apr 04 '09 at 18:11

Andy Baker

21,158
12
58
71

votes

5 answers

JavaScript DOM childNodes.length also returning number of text nodes

In JavaScript DOM, childNodes.length returns the number of both element and text nodes. Is there any way to count only the number of element-only child nodes? For example, childNodes.length of div#posts will return 6, when I expected 2:

javascript html dom html-parsing

asked Jun 15 '11 at 07:45

Samuel Liew

76,741
107
159
260

votes

6 answers

Extract links from a web page using Go lang

I am learning google's Go programming language. Does anyone know the best practice to extract all URLs from a html web page? Coming from the Java world, there are libraries to do the job, for example jsoup , htmlparser, etc. But for go lang, I guess…

html-parsing go

asked Jun 18 '12 at 10:24

Jifeng Zhang

5,037
4
30
43

Prev 1 2 3

…

99 100 Next