Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
24
votes
2 answers

In lxml, how do I remove a tag but retain all contents?

The problem is this: I have an XML fragment like so: text1 inner1 text2 inner2 text3 For the result, I want to remove all - and -Tags, but retain their (text)-contents, and childnodes just as they…
Thor
  • 373
  • 1
  • 2
  • 7
24
votes
3 answers

Get second element text with XPath?

google chrome I want to get chrome and have it working like this already. q = item.findall('.//span[@class="python"]//a') t = q[1].text # first element = 0 I'd like to combine it into a single XPath…
user479870
24
votes
2 answers

Set lxml as default BeautifulSoup parser

I'm working on a web scraping project and have ran into problems with speed. To try to fix it, I want to use lxml instead of html.parser as BeautifulSoup's parser. I've been able to do this: soup = bs4.BeautifulSoup(html, 'lxml') but I don't want…
Adam Hammes
  • 820
  • 1
  • 8
  • 22
22
votes
4 answers

Filtering out certain bytes in python

I'm getting this error in my python program: ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters This question, random text from /dev/random raising an error in lxml: All strings must be XML…
y3di
  • 673
  • 2
  • 8
  • 13
22
votes
3 answers

Append element after another element using lxml

I have the following HTML markup
something goes here

some contents

To fix some CSS issue, I want to append a div tag
Tu Hoang
  • 4,622
  • 13
  • 35
  • 48
22
votes
4 answers

Python Lxml (objectify): Checking whether a tag exists

I need to check whether a certain tag exists in an xml file. For example, I want to see if the tag exists in this snippet:
Hi ...
Currently, I am using an ugly hack with…
Biosci3c
  • 772
  • 5
  • 15
  • 35
22
votes
4 answers
22
votes
5 answers

lxml will never finish building on ubuntu

I am running ubuntu 14.04 LTS and python 2.7.5 on a vwmare. When I run: sudo pip install lxml I get: Collecting lxml Using cached lxml-3.4.4.tar.gz Building wheels for collected packages: lxml Running setup.py bdist_wheel for lxml which runs…
Rorschach
  • 3,684
  • 7
  • 33
  • 77
22
votes
5 answers

Installing easy_install... to get to installing lxml

I've come to grips with the fact that ElementTree isn't going to do what I want it to do. I've checked out the documentation for lxml, and it appears that it will serve my purposes. To get lxml, I need to get easy_install. So I downloaded it from…
Alex
  • 943
  • 3
  • 10
  • 13
22
votes
9 answers

lxml install on windows 7 using pip and python 2.7

When I try to upgrade lxml using pip on my windows 7 machine I get the log printed below. When I uninstall and try to install from scratch I get the same errors. Any ideas? Downloading/unpacking lxml from …
user2091046
  • 585
  • 1
  • 8
  • 20
22
votes
4 answers

How to re-install lxml?

Python version and Device used Python 2,7.5 Mac 10.7.5 BeautifulSoup 4.2.1. I'm following the BeautifulSoup tutorial but when I try to parse a xml page using the lxml library I get the following error: bs4.FeatureNotFound: Couldn't find a tree…
Mark23333
  • 321
  • 1
  • 3
  • 6
22
votes
1 answer

Parsing UTF-8/unicode strings with lxml HTML

I have been trying to parse with etree.HTML() a text encoded as UTF-8 without success. → python Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help",…
karlcow
  • 6,977
  • 4
  • 38
  • 72
22
votes
1 answer

Using python lxml.etree for huge XML files

I would like to parse a huge xml (>200MB) using lxml.etree in Python. I tried to use etree.parse to load the XML file, but this does not work due to the filesize: etree.parse('file.xml')Traceback (most recent call last): File "", line 1, in…
scdev
  • 221
  • 2
  • 3
21
votes
8 answers

Error while installing lxml through pip: Microsoft Visual C++ 14.0 is required

I am on a windows 10 machine and recently moved from python 2.7 to 3.5. When trying to install lxml through pip, it stops and throws this error message- building 'lxml.etree' extension error: Microsoft Visual C++ 14.0 is required. Get it with…
Zeokav
  • 1,653
  • 4
  • 14
  • 29
21
votes
3 answers

Why doesn't xpath work when processing an XHTML document with lxml (in python)?

I am testing against the following test document:
John
  • 14,944
  • 12
  • 57
  • 57