Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions

votes

1 answer

lxml xpath can not handle
tag

How to get p tag text "Blahblah" in this situation : when p tag text field is behind a strong tag, it can not be recognized by lxml.

ccBlahblah

====code==== from lxml import html content="""

html lxml

asked Mar 19 '15 at 14:13

babayetu

votes

0 answers

init() got an unexpected keyword argument 'convertEntities'

I'm getting the error in title when trying to parse a HTML with soupparser - external interface to the BeautifulSoup HTML parser. This is my code: from lxml.html.soupparser import fromstring fromstring(""); Also, since I'm…

python beautifulsoup lxml anaconda

asked Mar 12 '15 at 00:23

Tommz

3,393
7
32
44

votes

1 answer

Scraping IMDb Review Page with lxml and requests package

I want to extract the user reviews of a particular movie with help of lxml. Before that, I need to find out the number of reviews first. An example review page is Interstellar I found the XPath where User Reviews are found with the help of Firebug:…

python lxml lxml.html

asked Mar 05 '15 at 08:46

GokuShanth

votes

2 answers

python lxml.html.parse not reading url

Why is html.parse(url) failing, when using requests then html.fromstring works and html.parse(url2) works? lxml 3.4.2 Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()"…

python lxml python-requests

asked Mar 02 '15 at 00:55

foosion

7,619
25
65
102

votes

1 answer

Scraping multiple urls in parallel and inserting lxml element in queue

I am parsing multiple pages at once using lxml module with this piece of code def read_and_parse_url(url, queue): """ Read and parse the url """ data = urllib2.urlopen(url).read() root = lxml.html.fromstring(data) …

python multithreading queue multiprocessing lxml

asked Feb 13 '15 at 00:48

Thiago

votes

1 answer

how to write the opening of an xml doc in lxml?

I'm using lxml to write out a cXML file, but I can't figure out how to get it to write out the opening along with the doctype following it. When I started this, I started straight in on the document itself,…

python xml lxml cxml

asked Feb 11 '15 at 10:35

Bendustries

votes

1 answer

Help with parsing lxml

To implement a college project, I need to handle XML files. For this I choose lxml after doing some research. However I can't seem to find some nice tutorial to help me get started. I can't choose most specifically which type of parsing I need to…

python lxml

asked May 14 '10 at 20:47

user225312

126,773
69
172
181

votes

1 answer

lxml unicode entity parse problems

I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having…

python xml unicode lxml

asked May 14 '10 at 14:53

Jon Hadley

5,196
8
41
65

votes

1 answer

How do I require that an element has either one set of attributes or another in an XSD schema?

I'm working with an XML document where a tag must either have one set of attributes or another. For example, it needs to either look like or e.g.

python xml validation schema lxml

asked May 10 '10 at 22:27

Eli Courtwright

186,300
67
213
256

votes

0 answers

Combining tail and pretty_print in lxml

As soon as I modify the tail of an element (default is None), writing with pretty_print deletes all indentation. Everything is on a single line. Combining pretty_print and tail is not possible ? Example: from lxml import etree as et root =…

xml lxml indentation

asked Jan 21 '15 at 09:49

Eric H.

2,152
4
22
34

votes

2 answers

Regular expression works normally, but fails when placed in an XML schema

I have a simple doc.xml file which contains a single root element with a Timestamp attribute: I'd like to validate this document against a my simple schema.xsd to…

python regex validation schema lxml

asked May 10 '10 at 20:54

Eli Courtwright

186,300
67
213
256

votes

1 answer

Should Python 2.6 on OS X deal with multiple easy-install.pth files in $PYTHONPATH?

I am running ipython from sage and also am using some packages that aren't in sage (lxml, argparse) which are installed in my home directory. I have therefore ended up with a $PYTHONPATH of $HOME/sage/local/lib/python:$HOME/lib/python Python is…

python lxml easy-install

asked May 08 '10 at 12:50

ahd

votes

3 answers

Output of lxml in Python 2.7

This might be a completely foolish question, but google is to no avail. First of course importing the libraries I need: from lxml import html from lxml import etree import requests Simple enough. Now to run and parse some code. The link in this…

python python-2.7 lxml lxml.html

asked Jan 09 '15 at 02:38

Ruhpun

votes

1 answer

How do I do thread-safe python XML validation?

Using Python 3.3, I need to validate XML documents against their DTDs or XSDs, and I expect to validate many documents against each specification. I will have a multi-threaded application performing the validation. lxml documentation explains how…

python xml multithreading validation lxml

asked Jan 07 '15 at 16:38

Jim Scarborough

votes

3 answers

Python - Requests: Correctly Using Params?

Before I begin, may I just say, I am very new to general communication with the web in code. With that said, could anyone assist me in getting these parameters, 'a': stMonth, 'b': stDate, 'c': stYear, 'd': enMonth, …

python html request lxml lxml.html

asked Dec 30 '14 at 08:15

The Novice

Prev 1 2 3

…

99 100 Next