Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
40
votes
1 answer

Incredibly basic lxml questions: getting HTML/string content of lxml.etree._Element?

This is such a basic question that I actually can't find it in the docs :-/ In the following: img = house_tree.xpath('//img[@id="mainphoto"]')[0] How do I get the HTML of the tag? I've tried adding html_content() but get AttributeError:…
AP257
  • 89,519
  • 86
  • 202
  • 261
40
votes
4 answers

BeautifulSoup and lxml.html - what to prefer?

I am working on a project that will involve parsing HTML. After searching around, I found two probable options: BeautifulSoup and lxml.html Is there any reason to prefer one over the other? I have used lxml for XML some time back and I feel I will…
user225312
  • 126,773
  • 69
  • 172
  • 181
38
votes
4 answers

How do I use xml namespaces with find/findall in lxml?

I'm trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is stored in 'content.xml'. import zipfile from lxml import etree zf =…
saffsd
  • 23,742
  • 18
  • 63
  • 67
38
votes
6 answers

lxml runtime error: Reason: Incompatible library version: etree.so requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0

I have a perplexing problem. I have used mac version 10.9, anaconda 3.4.1, python 2.7.6. Developing web application with python-amazon-product-api. i have overcome an obstacle about installing lxml, referencing clang error: unknown argument:…
BlueFrog
  • 565
  • 1
  • 5
  • 7
38
votes
2 answers

How to use lxml to find an element by text?

Assume we have the following html: TEXT A TEXT B TEXT C How do I make it find the element "a", which contains…
user1973386
  • 1,095
  • 2
  • 10
  • 18
38
votes
15 answers

How do you install lxml on OS X Leopard without using MacPorts or Fink?

I've tried this and run in to problems a bunch of times in the past. Does anyone have a recipe for installing lxml on OS X without MacPorts or Fink that definitely works? Preferably with complete 1-2-3 steps for downloading and building each of the…
Simon Willison
  • 15,642
  • 5
  • 36
  • 44
35
votes
5 answers

Remove all javascript tags and style tags from html with python and the lxml module

I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain all contents? but the method described in that post leaves all the text,…
john-charles
  • 1,417
  • 4
  • 17
  • 30
35
votes
2 answers

python lxml - modify attributes

from lxml import objectify, etree root = etree.fromstring('''
Joao Figueiredo
  • 3,120
  • 3
  • 31
  • 40
35
votes
4 answers

Building lxml for Python 2.7 on Windows

I am trying to build lxml for Python 2.7 on Windows 64 bit machine. I couldn't find lxml egg for Python 2.7 version. So I am compiling it from sources. I am following instructions on this site http://lxml.de/build.html under static linking section.…
Kamal
  • 3,068
  • 5
  • 26
  • 26
35
votes
7 answers

Installing lxml with pip in virtualenv Ubuntu 12.10 error: command 'gcc' failed with exit status 4

I'm having the following error when trying to run "pip install lxml" into a virtualenv in Ubuntu 12.10 x64. I have Python 2.7. I have seen other related questions here about the same problem and tried installing python-dev, libxml2-dev and…
Cristian Rojas
  • 2,746
  • 7
  • 33
  • 42
34
votes
6 answers

Get the inner HTML of a element in lxml

I am trying to get the HTML content of child node with lxml and xpath in Python. As shown in code below, I want to find the html content of the each of product nodes. Does it have any methods like product.html? productGrids =…
Sudip Kafle
  • 4,286
  • 5
  • 36
  • 49
33
votes
6 answers

ImportError: cannot import name 'etree' on Python 3.6

I am getting error while running "from lxml import tree" on python3.6 >>> import lxml >>> from lxml import etree Traceback (most recent call last): File "", line 1, in ImportError: cannot import name 'etree' The same working on…
Amit Kumar
  • 431
  • 1
  • 4
  • 3
33
votes
3 answers

Why is lxml.etree.iterparse() eating up all my memory?

This eventually consumes all my available memory and then the process is killed. I've tried changing the tag from schedule to 'smaller' tags but that didn't make a difference. What am I doing wrong / how can I process this large file with…
sente
  • 2,327
  • 2
  • 18
  • 24
33
votes
4 answers

Error 'failed to load external entity' when using Python lxml

I'm trying to parse an XML document I retrieve from the web, but it crashes after parsing with this error: ': failed to load external entity "
daveeloo
  • 923
  • 3
  • 9
  • 8
32
votes
2 answers

How do I use a default namespace in an lxml xpath query?

I have an xml document in the following format: ...
ewok
  • 20,148
  • 51
  • 149
  • 254