Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions

votes

3 answers

Unable to install Python Scrapy (Lxml) on Windows

I was trying to install Python Scrapy library but when it's trying to install Lxml library, this error appears: Requirement already up-to-date: pip in c:\python34\lib\site-packages Collecting lxml Using cached lxml-3.4.4.tar.gz Complete output…

asked May 07 '15 at 17:25

Alejandra

votes

0 answers

ET.parse(filePath) is slower when run through different Python than one in Visual Studio

I run a python script (Python 2.7) that is computationally heavy and does some xml input/output operations. When I tested its elements in Python Interactive in VS they ran 2x faster (or more) than when I packed it into the main routine and ran it by…

python performance python-2.7 lxml

asked Apr 30 '15 at 10:16

jakub

votes

1 answer

Lxml : Ampersand in text

I have a problem using lxml I am using lxml to parse an xml file and again write it back to a new xml file. Input file: " example text " " example text " …

python xml parsing lxml ampersand

asked Apr 28 '15 at 14:54

radha shankar

votes

1 answer

How do I get all content between two html tags in Python?

I try to extract all content (tags and text) from one main tag on html page. For example: `my_html_page = '''

Some text …

python xml xpath beautifulsoup lxml

asked Apr 28 '15 at 12:38

Dim

votes

3 answers

Using "if" as argument identifier

I want to generate the following xml file: I've tried this: from lxml import etree etree.Element("foo", if="bar") But I got this error: page = etree.Element("configuration", if="ok") …

python lxml

asked Apr 22 '15 at 07:16

Brenno Martins

votes

2 answers

lxml.etree iterparse() and parsing element completely

I have an XML file with nodes that looks like this: 41.3681107 2015-04-11T03:52:33.000Z 3.9598 I am using lxml.etree.iterparse() to iteratively parse…

python lxml elementtree iterparse

asked Apr 17 '15 at 02:34

Andreas

votes

1 answer

"lxml.etree.XPathEvalError: Invalid expression" with Unicode element names

lxml nicely supports Unicode element names, as they are valid according to XML specification. But using Unicode in XPath produces an error: >>> import lxml.etree >>> e = lxml.etree.fromstring('

python xpath unicode lxml

asked Apr 17 '15 at 02:14

alexanderlukanin13

4,577
26
29

votes

1 answer

Stop pyquery inserting spaces where there aren't any in source HTML?

I am trying to get some text from an element, using pyquery 1.2. There are no spaces in the displayed text, but pyquery is inserting spaces. Here is my code: from pyquery import PyQuery as pq html = '

python lxml pyquery

asked Apr 13 '15 at 10:17
Richard

62,943

126

334

542

votes

1 answer

How to fix lxml assertion error

I have an ubuntu machine running pythong.2.7.6. When I try using lxml, which has been installed using pip, I get the following error: Traceback (most recent call last): File "./export.py", line 44, in fetch_item root.append(elem) File…

python lxml

asked Apr 10 '15 at 21:08

David542

104,438
178
489
842

votes

2 answers

lxml doesn't get all text in element if text has
?

I am using lxml to parse web document, I want to get all the text in a

element, so I use the code as follow: from lxml import etree page = etree.HTML("

test1
test2

") print page.xpath("//p")[0].text # this just print…

python text lxml elementtree

asked Apr 10 '15 at 07:12

roger

9,063
20
72
119

votes

3 answers

How to get textarea value with lxml python

With this python code i can get whole html source import mechanize import lxml.html import StringIO br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13)…

python lxml lxml.html

asked Apr 08 '15 at 10:24

Dark Cyber

2,181
7
44
68

votes

2 answers

Python 3.4.0 -- 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) -- Unix 14.04

Trying to retrieve some data from the web using urlib and lxml, I've got an error and have no idea, how to fix it. url='http://sum.in.ua/?swrd=автор' page = urllib.request.urlopen(url) The error itself: UnicodeEncodeError: 'ascii' codec can't…

python encoding utf-8 ascii lxml

asked Apr 03 '15 at 16:20

Khrystyna

votes

3 answers

How to select text without the HTML markup

I am working on a web scraper (using Python), so I have a chunk of HTML from which I am trying to extract text. One of the snippets looks something like this:

This class has some text and a few

python html xpath web-scraping lxml

asked Apr 01 '15 at 19:02

Yuka

votes

1 answer

Using lxml to Validate HTML

I am trying to use lxml to validate a piece of HTML but it complains that the fragment is invalid even though it should be valid: img = """""" parser =…

html validation lxml lxml.html

asked Mar 27 '15 at 03:57

Alex Rothberg

10,243
13
60
120

votes

1 answer

Creating an element with 'class' attribute throws a syntax error

When I try to do this with the lxml module: div = etree.SubElement(body, "div", class="hmi") I get a: user@localhost:metk $ sudo python mbscan.py -r 192.168.0.0/24 --hmi File "mbscan.py", line 481 div = etree.SubElement(body, "div",…

python lxml

asked Mar 22 '15 at 01:51

Juicy

11,840
35
123
212

Prev 1 2 3

…

99 100 Next

Questions tagged [lxml]

python lxml pyquery asked Apr 13 '15 at 10:17 Richard 62,943 126 334 542

python lxml pyquery

asked Apr 13 '15 at 10:17
Richard

62,943

126

334

542