Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions

votes

2 answers

Why etree.find doesn't find the element for the provided example

Lets suppose it has this: xml_as_str = ''' Foo Bar foo@bar.com ''' from lxml import etree tree = etree.fromstring(xml_as_str, etree.XMLParser(recover=True)) How could it…

asked Nov 07 '14 at 14:53

trinchet

6,753
4
37
60

votes

1 answer

Extracting all cities in Wikipedia

http://en.wikipedia.org/wiki/List_of_cities_in_China I want to extract all city names as shown below: I use the following code (for only extract one field), where xpath is copy from chrome from lxml import html import requests page =…

python python-2.7 xpath beautifulsoup lxml

asked Oct 30 '14 at 07:05

william007

17,375
25
118
194

votes

3 answers

Add / update elements at position using lxml python

I have a situation where I want to add a particular element at the position and update if there is already present at the given position. Ex: ? …

python python-2.7 lxml

asked Oct 23 '14 at 08:19

Vimalraj Selvam

2,155
3
23
52

votes

2 answers

Scraping paginated sites and appending output in Python

I have a simple scraping task that I would like to improve the pagination efficiency of, and append lists so that I may output the results of scraping to a common/single file. The current task is scraping municipal laws for the city of São Paulo,…

python pagination web-scraping lxml python-requests

asked Oct 07 '14 at 22:13

DV Hughes

votes

3 answers

Finding inline style with lxml.cssselector

New to this library (no more familiar with BeautifulSoup either, sadly), trying to do something very simple (search by inline style): blah blah I just want to select all tds where style="padding: 20px", but I can't…

python lxml

asked Apr 12 '10 at 02:21

ropa

votes

1 answer

Get text next to selected element in lxml / Python

I have the following HTML markup and I'd like to get the English description as plain text out of this snippet - without the "English, and without any tags": from lxml import etree html = '''

English:…

python html lxml elementtree

asked Oct 03 '14 at 13:26

Simon Steinberger

6,605
5
55
97

votes

1 answer

gcc Internal error on lxml installation CentOS

I am having some trouble installing lxml on CentOS-6. I have tried the solutions of some similar questions like, pip install lxml error or Setup.py: install lxml with Python2.6 on CentOS but these did not work. How to install it correctly? after…

python gcc centos pip lxml

asked Oct 03 '14 at 12:06

salmanwahed

9,450
7
32
55

votes

1 answer

How to modify XML as text in lxml

I have an XML file generated by an IDE; however, it unfortunately outputs code with newlines as BRs and seems to randomly decide where to place newlines. Example: if test = true foo; bar; endif becomes the following XTML within an XML…

python xml python-2.7 xml-parsing lxml

asked Oct 03 '14 at 00:30

user1601333

votes

1 answer

How can I get the text with xPath between and
?

I have the HTML code and I want to parse string that starts with "Pour all ingredients" with xPath. I have already done the trick with span and li objects. But this text is not belonged to anything. How should I write the xpath? EG for li: for…

python html xpath lxml

asked Oct 01 '14 at 17:12

alex

votes

2 answers

Extracting the value by xpath in python between tags

I want to extract parameter that I referred in the picture below... What I have tried is: url='http://site.ir' content=requests.get(url).content tree = html.fromstring(content) print [e.text_content() for e in…

python html xpath html-parsing lxml

asked Sep 28 '14 at 18:39

MLSC

5,872
8
55
89

votes

1 answer

lxml etree and xpath returning an encoded image rather than URL for src

I want the src url of an image when I process some html, but I am getting back an encoded image. What am I doing wrong if I want the url? Given a url like: "http://www.amazon.com/Cheese-Plate-multi-purpose-mounting-plate/dp/B00CI06DWE/" And a…

python python-2.7 xpath html-parsing lxml

asked Sep 21 '14 at 22:30

dolphinkickme

votes

2 answers

Get attributes and text from Xpath query as a list

I would like to query an html string and extract the href attribute and the text node from an hyperlink into a list (or any other dictionary). Consider the following code: from lxml import html str = ' Text1 ' \ '

python xpath lxml

asked Sep 13 '14 at 18:34

madflow

7,718
3
39
54

votes

2 answers

integration of python into excel using pyxll... having problems with lxml module

I am new to python. I am trying to get the meaning of a word from internet. The standalone python code works just fine. from lxml import html import requests url = "http://dictionnaire.reverso.net/francais-definition/" word =…

python xml excel lxml pyxll

asked Sep 09 '14 at 15:45

Ravi Gautam

votes

4 answers

How to convert XPath Element to plain html text?

I have page:

text_url

And I want to get element '//div/a' as plain html text. text_url How can I do it?

python html xpath lxml

asked Sep 05 '14 at 11:01

Anton Barycheuski

votes

2 answers

Python - Parse HTML class

I have tried in anger to parse the following representative HTML extract, using BeautifulSoup and lxml: [

Abacus Trust Company Limited
Sixty Circular Road
DOUGLAS
ISLE…

python html parsing beautifulsoup lxml

asked Sep 02 '14 at 10:32

Chris Finlayson

Prev 1 2 3

…

99 100 Next