Questions tagged [lxml.html]

lxml.html is a dedicated python package for dealing with HTML.

lxml.html is a dedicated python package for dealing with HTML. It is based on lxml's HTML parser, but provides a special Element API for HTML elements, as well as a number of utilities for common HTML processing tasks.

159 questions
4
votes
1 answer

Python how to decode only specific part in xml with using suds MessagePlugin and lxml

I am taking products information from an endpoint. In order to parse that information I am using a filter which is suds MessagePlugin. The incoming data like as follows: (That is not contains the hole request. It contains a small part of…
bufferoverflow
  • 81
  • 1
  • 1
  • 4
4
votes
2 answers

Python: Convert Raw String to Bytes String without adding escape chraracters

I have a string: 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' And I want: b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' But I…
Bryan Yao
  • 65
  • 2
  • 7
4
votes
1 answer

Python lxml, removing parent elements before outputting HTML (using fragment_fromstring)

I'm using lxml to parse some HTML fragments (from a RSS feed), and in order to do this efficiently I use the create_parent='div'. When i later output the HTML I don't want the parent div to be included since with my html layout it ends up being a…
Alexander Kuzmin
  • 1,120
  • 8
  • 16
4
votes
1 answer

How to insert a HTML element in a tree of lxml.html

I am using python 3.3 and lxml 3.2.0 Problem: I have a web page in a variable webpageString = "webpage content" And I want to insert a css link tag between the two header tags, so that I get webpageString =…
user1986258
  • 43
  • 1
  • 4
3
votes
3 answers

Scraping dynamic html fields with lxml

I have been trying to scrape a dynamic field of an HTML page using lxml The code is pretty simple and is below: from lxml import html import requests page = requests.get('http://www.airmilescalculator.com/distance/blr-to-cdg/') tree =…
Tauseef Hussain
  • 1,049
  • 4
  • 15
  • 29
3
votes
1 answer

python lxml.html: proper way to iterate through text with .tail in docstring order

I'm trying to traverse an html string and concatenate the text content, with a string joiner that varies with the type of html tag encountered. Example html: html_str='

This is how
we
parse
our string

deseosuho
  • 958
  • 3
  • 10
  • 28
3
votes
2 answers

python lxml: syntax for selectively deleting inline style attributes?

I'm using python 3.4 with the lxml.html library. I'm trying to remove the border-bottom in-line styling from html elements that I've targeted with a css selector. Here's a code fragment showing a sample td element and my selector: html_snippet =…
deseosuho
  • 958
  • 3
  • 10
  • 28
3
votes
2 answers

lxml and

I got a strange bug with lxml: >>> s = '' >>> root = lxml.html.fromstring(s) >>> root.xpath('/html/head/meta') >>> root.xpath('/html/body/meta') [
user4801897
3
votes
1 answer

using lxml to find the literal text of url links

(Python 3.4.2) First off, I'm pretty new to python--more than a beginner but less than an intermediate user. I'm trying to display the literal text of url's in a page by using lxml. I think I've ALMOST got it, but I'm missing something. I can get…
GreenRaccoon23
  • 3,603
  • 7
  • 32
  • 46
3
votes
3 answers

store html in python

I'm using both xpath and beautifulsoup to scrape webpage. Xpath need tree as input and beautifulsoup need soup as input. Here're the code to get tree and soup: def get_tree(url): r = requests.get(url) tree = html.fromstring(r.content) …
3
votes
1 answer

TypeError: decoding Unicode is not supported python

I am using lxml.html to parse an html file and get the text from the page. Bur now I have a string which has a character ' for example Florian's due to which, while printing the output I get traceback parent_link_id_text = …
Sangamesh Hs
  • 1,447
  • 3
  • 24
  • 39
2
votes
2 answers

How to keep all html elements with selector but drop all others?

I would like to get a HTML string without certain elements. However, upfront I just know which elements to keep but don't know which ones to drop. Let's say I just want to keep all p and a tags inside the div with class="A". Input:
Wuff
  • 257
  • 1
  • 8
2
votes
0 answers

XPath gets very different results between google chrome's XPath Helper tool vs lxml.html

I have an XPath expression that works perfect in google chrome's XPath Helper tool. Using this web page: enter link description here and paste this in the xpath tool: //dd[@class='open-hours']//div//span/following-sibling::text() and you will get…
spacedog
  • 446
  • 3
  • 13
2
votes
1 answer

lxml xpath expression help needed

I have the below HTML from a view:source of a webpage
Shekhar Samanta
  • 875
  • 2
  • 12
  • 25
2
votes
1 answer

Using XPath, select node without text sibling

I want to extract some HTML elements with python3 and the HTML parser provided by lxml. Consider this HTML: bar foo Consider…
Hermann
  • 604
  • 7
  • 23
1
2
3
10 11