Highest Voted 'lxml.html' Questions

4

votes

1 answer

Python how to decode only specific part in xml with using suds MessagePlugin and lxml

I am taking products information from an endpoint. In order to parse that information I am using a filter which is suds MessagePlugin. The incoming data like as follows: (That is not contains the hole request. It contains a small part of…

asked Sep 22 '21 at 06:50

bufferoverflow

81
1
1
4

4

votes

2 answers

Python: Convert Raw String to Bytes String without adding escape chraracters

I have a string: 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' And I want: b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' But I…

python python-3.x lxml lxml.html bz2

asked Jul 21 '18 at 15:20

Bryan Yao

65
2
7

4

votes

1 answer

Python lxml, removing parent elements before outputting HTML (using fragment_fromstring)

I'm using lxml to parse some HTML fragments (from a RSS feed), and in order to do this efficiently I use the create_parent='div'. When i later output the HTML I don't want the parent div to be included since with my html layout it ends up being a…

python html-parsing lxml lxml.html

asked Jun 29 '13 at 14:32

Alexander Kuzmin

1,120
8
16

4

votes

1 answer

How to insert a HTML element in a tree of lxml.html

I am using python 3.3 and lxml 3.2.0 Problem: I have a web page in a variable webpageString = "webpage content" And I want to insert a css link tag between the two header tags, so that I get webpageString =…

python lxml lxml.html

asked Jun 21 '13 at 12:49

user1986258

43
1
4

3

votes

3 answers

Scraping dynamic html fields with lxml

I have been trying to scrape a dynamic field of an HTML page using lxml The code is pretty simple and is below: from lxml import html import requests page = requests.get('http://www.airmilescalculator.com/distance/blr-to-cdg/') tree =…

python html web-scraping lxml lxml.html

asked Feb 04 '16 at 16:38

Tauseef Hussain

1,049
4
15
29

3

votes

1 answer

python lxml.html: proper way to iterate through text with .tail in docstring order

I'm trying to traverse an html string and concatenate the text content, with a string joiner that varies with the type of html tag encountered. Example html: html_str='

This is how
we
par^se
our string

…

python html-parsing lxml lxml.html

asked Nov 16 '15 at 18:54

deseosuho

958
3
10
28

3

votes

2 answers

python lxml: syntax for selectively deleting inline style attributes?

I'm using python 3.4 with the lxml.html library. I'm trying to remove the border-bottom in-line styling from html elements that I've targeted with a css selector. Here's a code fragment showing a sample td element and my selector: html_snippet =…

python css html-parsing lxml lxml.html

asked Sep 16 '15 at 20:21

deseosuho

958
3
10
28

3

votes

2 answers

lxml and ' >>> root = lxml.html.fromstring(s) >>> root.xpath('/html/head/meta') >>> root.xpath('/html/body/meta') [

python lxml noscript lxml.html

asked Sep 07 '15 at 11:29

user4801897

3

votes

1 answer

using lxml to find the literal text of url links

(Python 3.4.2) First off, I'm pretty new to python--more than a beginner but less than an intermediate user. I'm trying to display the literal text of url's in a page by using lxml. I think I've ALMOST got it, but I'm missing something. I can get…

python python-3.x lxml lxml.html

asked Dec 08 '14 at 18:21

GreenRaccoon23

3,603
7
32
46

3

votes

3 answers

store html in python

I'm using both xpath and beautifulsoup to scrape webpage. Xpath need tree as input and beautifulsoup need soup as input. Here're the code to get tree and soup: def get_tree(url): r = requests.get(url) tree = html.fromstring(r.content) …

python html beautifulsoup lxml lxml.html

asked Nov 07 '14 at 21:50

f4fc2791e4473eb2ba41b5ddb445b2

285
6
15

3

votes

1 answer

TypeError: decoding Unicode is not supported python

I am using lxml.html to parse an html file and get the text from the page. Bur now I have a string which has a character ' for example Florian's due to which, while printing the output I get traceback parent_link_id_text = …

python python-2.7 unicode-string lxml.html

asked Jul 17 '13 at 13:30

Sangamesh Hs

1,447
3
24
39

2

votes

2 answers

How to keep all html elements with selector but drop all others?

I would like to get a HTML string without certain elements. However, upfront I just know which elements to keep but don't know which ones to drop. Let's say I just want to keep all p and a tags inside the div with class="A". Input:

…

python lxml lxml.html

asked Sep 13 '21 at 12:34

Wuff

257
1
8

2

votes

0 answers

XPath gets very different results between google chrome's XPath Helper tool vs lxml.html

I have an XPath expression that works perfect in google chrome's XPath Helper tool. Using this web page: enter link description here and paste this in the xpath tool: //dd[@class='open-hours']//div//span/following-sibling::text() and you will get…

python xpath lxml.html

asked May 01 '21 at 20:21

spacedog

446
3
13

2

votes

1 answer

lxml xpath expression help needed

I have the below HTML from a view:source of a webpage

python-3.x lxml xml.etree lxml.html

asked Aug 24 '18 at 15:07

Shekhar Samanta

875
2
12
25

2

votes

1 answer

Using XPath, select node without text sibling

I want to extract some HTML elements with python3 and the HTML parser provided by lxml. Consider this HTML: bar foo Consider…

python-3.x xpath lxml.html

asked Feb 26 '18 at 14:18

Hermann

604
7
23

Prev 1

2

3

…

10 11 Next

Questions tagged [lxml.html]

lxml and in <head></a></h3> <div class="excerpt">I got a strange bug with lxml: >>> s = '<html><head><noscript>' >>> root = lxml.html.fromstring(s) >>> root.xpath('/html/head/meta') >>> root.xpath('/html/body/meta') [ python lxml noscript lxml.html asked Sep 07 '15 at 11:29 user4801897

lxml and ' >>> root = lxml.html.fromstring(s) >>> root.xpath('/html/head/meta') >>> root.xpath('/html/body/meta') [

python lxml noscript lxml.html

asked Sep 07 '15 at 11:29

user4801897