Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
69
votes
5 answers

Write xml file using lxml library in Python

I'm using lxml to create an XML file from scratch; having a code like this: from lxml import etree root = etree.Element("root") root.set("interesting", "somewhat") child1 = etree.SubElement(root, "test") How do I write root Element object to an…
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
66
votes
2 answers

How to find recursively for a tag of XML using LXML?

Using lxml is it possible to find recursively for tag " f1 "? I tried findall method…
shahjapan
  • 13,637
  • 22
  • 74
  • 104
65
votes
2 answers

finding elements by attribute with lxml

I need to parse a xml file to extract some data. I only need some elements with certain attributes, here's an example of document:
some text
Jérôme Pigeot
  • 2,091
  • 4
  • 22
  • 25
64
votes
2 answers

selecting attribute values from lxml

I want to use an xpath expression to get the value of an attribute. I expected the following to work from lxml import etree for customer in etree.parse('file.xml').getroot().findall('BOB'): print customer.find('./@NAME') but this gives an…
GHZ
  • 3,365
  • 4
  • 24
  • 28
62
votes
2 answers

What are the differences between lxml and ElementTree?

When it comes to generating XML data in Python, there are two libraries I often see recommended: lxml and ElementTree From what I can tell, the two libraries are very similar to each other. They both seem to have similar module names, usage…
Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
62
votes
7 answers

Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn…
Monika Sulik
  • 16,498
  • 15
  • 50
  • 52
56
votes
5 answers

lxml etree xmlparser remove unwanted namespace

I have an xml doc that I am trying to parse using Etree.lxml
1
some stuff My code is: path = "path to xml file" from…
Mark
  • 2,522
  • 5
  • 36
  • 42
55
votes
2 answers

Python: Using xpath locally / on a specific element

I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want. For example: tree =…
pvt pns
  • 553
  • 1
  • 4
  • 4
50
votes
3 answers

how to remove attribute of a etree Element?

I've Element of etree having some attributes - how can we delete the attribute of perticular etree Element.
shahjapan
  • 13,637
  • 22
  • 74
  • 104
49
votes
6 answers

How can I install lxml in docker

I want to deploy my python project in docker, I wrote lxml>=3.5.0 in the requirments.txt as the project needs lxml. Here is my dockfile: FROM gliderlabs/alpine:3.3 RUN set -x \ && buildDeps='\ python-dev \ py-pip \ …
thiiiiiking
  • 1,233
  • 3
  • 12
  • 16
44
votes
6 answers

Find python lxml version

How can I find the installed python-lxml version in a Linux system? >>> import lxml >>> lxml.__version__ Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute '__version__' >>>…
Niklas9
  • 8,816
  • 8
  • 37
  • 60
44
votes
4 answers

How to get path of an element in lxml?

I'm searching in a HTML document using XPath from lxml in python. How can I get the path to a certain element? Here's the example from ruby nokogiri: page.xpath('//text()').each do |textnode| path = textnode.path puts path end print for…
Fluffy
  • 27,504
  • 41
  • 151
  • 234
43
votes
10 answers

Remove namespace and prefix from xml in python using lxml

I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml:
speedyrazor
  • 3,127
  • 7
  • 33
  • 51
41
votes
6 answers

Using Python Iterparse For Large XML Files

I need to write a parser in Python that can process some extremely large files ( > 2 GB ) on a computer without much memory (only 2 GB). I wanted to use iterparse in lxml to do it. My file is of the format: Item 1
Dave Johnshon
  • 475
  • 1
  • 7
  • 6
40
votes
2 answers

Pretty print in lxml is failing when I add tags to a parsed tree

I have an xml file that I'm using etree from lxml to work with, but when I add tags to it, pretty printing doesn't seem to work. >>> from lxml import etree >>> root = etree.parse('file.xml').getroot() >>> print etree.tostring(root, pretty_print =…
Kris Harper
  • 5,672
  • 8
  • 51
  • 96