Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
2
votes
1 answer

Parsing XML using lxml, unable to get text when there is another child node

I am parsing a XML file, downloaded from internet, using lxml. It has a structure something similar to this: Some text in A node Some text in C nodeSome text in B node I want to print the text inside the…
sk11
  • 1,779
  • 1
  • 17
  • 29
2
votes
1 answer

lxml order of attributes

As stated in this question: lxml preserves attributes order? And taking the @abarnet suggestion I wrote the following line of code: root = ET.Element('{%s}Catalogo' % SATNS,…
Diego Calzadilla
  • 309
  • 6
  • 19
2
votes
0 answers

How to apply CSS to XML in python?

I have an xml file that would I would like to parse (E.g. using lxml.etree) apply a set of css styles to, and then export. For instance, the input would be this xml: and…
MemoryWrangler
  • 335
  • 2
  • 10
2
votes
1 answer

No module named lxml.html

Running OS X 10.9.4, I'm trying to use Scrapy, but I get this error: Traceback (most recent call last): File "/usr/local/bin/scrapy", line 3, in from scrapy.cmdline import execute File…
mobius
  • 81
  • 2
  • 5
2
votes
2 answers

Add multiple of same xml subelements to existing parent element

I'm trying to take a comma seperated list [Action, Adventure, Family] and for each item in the list, create a new tag inside of a tag. The desired output: Action Adventure
irncty
  • 21
  • 1
2
votes
2 answers

Using DTDs to Parse XML

I'm attempting to parse the USPTO data that is hosted Here. I have also retrieved the DTDs associated with the files. My question is: is it possible to use these to parse the files, or are they only used for validation? I have already used one as a…
drowningincode
  • 1,115
  • 1
  • 12
  • 19
2
votes
0 answers

Lxml - No module name lxml.etree

I am trying to install lxml and run into the following error after a successful pip-install to my venv: ImportError: No module named lxml.etree etree.so is clearly in my venv, along with lxml.etree.h and lxml.etree_api.h . I have even tried…
ViktorSodd
  • 39
  • 4
2
votes
1 answer

using fromstring() with lxml prefixes

I have a variable ele. I'm trying to append a child node onto ele that contains a namespace prefix (called style) in its tag. ele seems to be aware of this prefix, as the…
Fred the Fantastic
  • 1,295
  • 1
  • 9
  • 11
2
votes
1 answer

I need a polyfill for objectify.SubElement

I am trying to use pptx-python on Google App Engine (to create a powerpoint file). I don't need images, so I just commented out the dependencies on Pillow. That left me with something that almost works, except I have a version problem. The version…
Joshua Smith
  • 3,689
  • 4
  • 32
  • 45
2
votes
1 answer

Removing img tag in lxml

I have this code: from lxml.html import fromstring, tostring html = "

Here is some text

" doc = fromstring(html) img = doc.find('.//img') doc.remove(img) print tostring(doc) And the output is:

Why does…
rmacqueen
  • 971
  • 2
  • 8
  • 22
2
votes
2 answers

Should I strip the XML declaration from suds output before parsing with lxml?

I’m trying to implement a SOAP webservice in Python 2.6 using the suds library. That is working well, but I’ve run into a problem when trying to parse the output with lxml. Suds returns a suds.sax.text.Text object with the reply from the SOAP…
mikl
  • 23,749
  • 20
  • 68
  • 89
2
votes
1 answer

Syntax troubles with lxml xpath

I'm having difficulty understanding the proper syntax to use to get at single elements when parsing XML in Python with lxml. when I do this: print self.root.xpath("descendant::*[@Name='GevCCP']/*",namespaces=self.nsmap) I get a list of subordinate…
Octopus
  • 8,075
  • 5
  • 46
  • 66
2
votes
1 answer

Parse XHTML5 with undefined entities

Please consider this: import xml.etree.ElementTree as ET xhtml = '''
theta
  • 24,593
  • 37
  • 119
  • 159
2
votes
1 answer

Parsing XML file with lxml in Python

I need to parse an xml file, lest say called example.xml, that looks like the following:
dragon
  • 105
  • 1
  • 5
2
votes
1 answer

XPath invalid expression when tag name has curly braces

The name of the tag I am trying to get to is {http://whitehatsec.com/XML-api-Vuln}description. Conveniently every tag is prefixed with that lovely reference to the whitehat website. Unfortunately the xpath in lxml doesn't like it. I am currently…
thaweatherman
  • 1,467
  • 4
  • 20
  • 32
1 2 3
99
100