Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions

votes

2 answers

CxFreeze is not recognizing certain imports

After building my executable with CX_Freeze and trying to run the .exe I get this error. I understand this means CxFreeze is not recognizing lxml. However I have tried to include this in my setup.py Traceback(most recent call last): File…

asked Jun 23 '14 at 12:34

sudobangbang

1,406
10
32
55

votes

1 answer

Getting XML attribute value with lxml module

How can i get the value of an attribute of XML file with lxml module? My XML looks like this" somename 0.456 0.4 …

python xml parsing lxml

asked Jun 16 '14 at 10:33

Pythonizer

1,080
4
15
25

votes

3 answers

Best way to get back to using the power of lxml after having to use a regex to find something in an html document

I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in…

python regex html-parsing lxml

asked Mar 10 '10 at 23:13

PyNEwbie

4,882
4
38
86

votes

2 answers

How to replace an HTML tag with text inside an lxml iterwalk loop

I'm iterating through an HTML tree with lxml iterwalk and I'd like to replace all
tags inside

with new line characters. That's what I have so far: root = lxml.html.fromstring(text) for action, el in etree.iterwalk(root): if…

python html replace html-parsing lxml

asked Jun 09 '14 at 13:29

Simon Steinberger

6,605
5
55
97

votes

2 answers

Get XPath of an element in DOM tree?

I'm using lxml implementation in python for HTML and XML parsing. Setting up a parser like parser = lxml.etree.HTMLParser() and returning a tree from HTML source (string) tree = lxml.etree.fromstring(html, parser).getroottree() # Returns a XML…

python dom selenium xpath lxml

asked Jun 02 '14 at 15:48

user3623152

votes

1 answer

lxml: Append 'None' or Null value when html tag text content is None

Trying to read a html content and extract the last table's content to an array using lxml. Here is my last table: …

python google-app-engine lxml

asked May 13 '14 at 08:03

Nijin Narayanan

2,269
2
27
46

votes

1 answer

How do I scrape an https page?

I'm using a python script with 'lxml' and 'requests' to scrape a web page. My goal is to grab an element from a page and download it, but the content is on an HTTPS page and I'm getting an error when trying to access the stuff in the page. I'm sure…

python lxml scrape

asked May 01 '14 at 21:06

kevingduck

votes

0 answers

XML indentation set to 4 spaces

I'm using the following code to indent, as mentioned here: parser = etree.XMLParser(remove_blank_text=True) tree = etree.parse(filename, parser) However, the original XML file is indented with 4 spaces and after using the code above it indents to 2…

python xml lxml indentation

asked Apr 11 '14 at 17:05

bulkmoustache

1,875
3
20
24

votes

1 answer

Printing out messages from a lxml error log in UTF-8 format

I learn python (2.7 version) and i have task to check the xml document by xsd schema using lxml library (http://lxml.de/). I have two files - examples like these: $ cat 1.xml

python xml parsing xsd lxml

asked Apr 10 '14 at 22:08

dmgl

votes

2 answers

Crawling tables from webpage

I'm trying to extract csu employee salary data from this webpage (http://www.sacbee.com/statepay/#req=employee%2Fsearch%2Fname%3D%2Fyear%3D2013%2Fdepartment%3DCSU%20Sacramento). I've tried using urlib2 and requests library, but none of them returned…

python html web-crawler lxml scrape

asked Apr 08 '14 at 21:58

jinlong

votes

1 answer

lxml, add SubElement to SubElement

I've created an XML that looks like this. false

…

python xml lxml

asked Apr 07 '14 at 20:01

ErikSorensen

votes

2 answers

Issue with parsing list of HTML with lxml and requests

I have a list of URLs stored in a variable href. When I pass it through the below function, the only returned recipe_links come from the first URL in href. Are there any glaring errors with my code? I'm not sure why it wouldn't loop through all 20…

python html html-parsing lxml python-requests

asked Apr 04 '14 at 18:34

metersk

11,803
21
63
100

votes

2 answers

Unicode: Python / lxml file output not as expected (print vs write)

I'm parsing an xml file using the code below: import lxml file_name = input('Enter the file name, including .xml extension: ') print('Parsing ' + file_name) from lxml import etree parser = lxml.etree.XMLParser() tree =…

python xml unicode utf-8 lxml

asked Apr 03 '14 at 13:15

Nick

votes

2 answers

Removing all children tags past a specific depth

Take some rudimentary HTML like this as an example. How could one remove all children nodes past say 2 nodes deep before it truncates and removes it.

…

python html beautifulsoup lxml

asked Apr 01 '14 at 03:34

ATMA

votes

3 answers

python xml xpath query using tag and attribute with ns

I must be doing something inherently wrong here, every example I've seen and search for on SO seems to suggest this would work. I'm trying to use an XPath search with lxml etree library to parse a garmin tcx file:

python xml xpath xml-parsing lxml

asked Mar 17 '14 at 20:18

kikixx

Prev 1 2 3

…

100

T1	T2