Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
18
votes
3 answers

Remove ns0 from XML

I have an XML file where I would like to edit certain attributes. I am able to properly edit the attributes but when I write the changes to the file, the tags have a strange "ns0" added onto them. How can I get rid of this? This is what I have tried…
user4500293
  • 621
  • 1
  • 7
  • 18
18
votes
2 answers

Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags

I see there are similar questions here, but nothing that has totally helped me. I've also looked at the official documentation on namespaces but can't find anything that is really helping me, perhaps I'm just too new at XML formatting. I understand…
emmdee
  • 1,541
  • 3
  • 25
  • 46
18
votes
5 answers

XML Declaration standalone="yes" lxml

I have an xml I am parsing, making some changes and saving out to a new file. It has the declaration which I would like to keep. When I am saving out my new file I am loosing the…
speedyrazor
  • 3,127
  • 7
  • 33
  • 51
18
votes
1 answer

lxml not adding newlines when inserting a new element into existing xml

I have a large set of existing xml files, and I am trying to add one element to all of them (they are pom.xml for a number of maven projects, and I am trying to add a parent element to all of them). The following is my exact code. The problem is…
Aman
  • 639
  • 1
  • 9
  • 25
18
votes
2 answers

Preserving original doctype and declaration of an lxml.etree parsed xml

I'm using python's lxml and I'm trying to read an xml document, modify and write it back but the original doctype and xml declaration disappears. I'm wondering if there's an easy way of putting it back in whether through lxml or some other solution?
incognito2
  • 1,024
  • 3
  • 13
  • 20
18
votes
1 answer

How to test if an attribute exists in some XML

I have some XML that I am parsing in python via lxml. I am encountering situations where some elements have attributes and some don't. I need to extract them if they exist, but skip them if they don't - I'm currently landing with errors (as my…
Jay
  • 753
  • 3
  • 11
  • 19
18
votes
3 answers

Remove class attribute from HTML using Python and lxml

Question How do I remove class attributes from html using python and lxml? Example I have:

Lorem ipsum dolor sit amet, consectetur adipisicing elit

I want:

Lorem ipsum dolor sit amet, consectetur adipisicing…

Jeff
  • 3,879
  • 3
  • 26
  • 28
17
votes
3 answers

Is there an elegant way to count tag elements in a xml file using lxml in python?

I could read the content of the xml file to a string and use string operations to achieve this, but I guess there is a more elegant way to do this. Since I did not find a clue in the docus, I am sking here: Given an xml (see below) file, how do you…
Aufwind
  • 25,310
  • 38
  • 109
  • 154
17
votes
1 answer

pretty_print option in tostring not working in lxml

I'm trying to use the tostring method in XML to get a "pretty" version of my XML as a string. The example on the lxml site shows this example: >>> import lxml.etree as etree >>> root = etree.Element("root") >>> print(root.tag) root >>> root.append(…
lanteau
  • 255
  • 2
  • 7
17
votes
2 answers

using xpath to select an element after another

I've seen similar questions, but the solutions I've seen won't work on the following. I'm far from an XPath expert. I just need to parse some HTML. How can I select the table that follows Header 2. I thought my solution below should work, but…
jseabold
  • 7,903
  • 2
  • 39
  • 53
17
votes
6 answers

lxml: add namespace to input file

I am parsing an xml file generated by an external program. I would then like to add custom annotations to this file, using my own namespace. My input looks as below:
kai
  • 1,970
  • 2
  • 22
  • 30
16
votes
3 answers

Finding html element with class using lxml

I've searched everywhere and what I most found was doc.xpath('//element[@class="classname"]'), but this does not work no matter what I try. code I'm using import lxml.html def check(): data = urlopen('url').read(); return str(data); doc =…
Vexx
  • 161
  • 1
  • 1
  • 4
16
votes
2 answers

Need python lxml syntax help for parsing html

I am brand new to python, and I need some help with the syntax for finding and iterating through html tags using lxml. Here are the use-cases I am dealing with: HTML file is fairly well formed (but not perfect). Has multiple tables on screen, one…
Shaheeb Roshan
  • 611
  • 1
  • 7
  • 17
16
votes
3 answers

What does this error mean: invalid ELF header

I'm getting an IMPORT ERROR with the following error message in Django debug mode /usr/local/lib/python2.6/dist-packages/lxml-2.3-py2.6-win32.egg/lxml/objectify.pyd: invalid ELF header What does this mean and how do I fix it? Google is revealing not…
super9
  • 29,181
  • 39
  • 119
  • 172
16
votes
1 answer

lxml error on Windows - AttributeError: module 'lxml' has no attribute 'etree'

I am using Anaconda v4.2 with Python 3.5 on Windows 32 bit, and wanting to use lxml etree. My Anaconda distribution includes lxml 3.6.4, but the only lxml function that my IDE (PyCharm, although I'm getting the same error when running the code with…
user2497748
  • 171
  • 1
  • 1
  • 3