Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions

votes

2 answers

Beautiful Soup and Table Scraping - lxml vs html parser

I'm trying to extract the HTML code of a table from a webpage using BeautifulSoup. ...

I would like to know why the code bellow works with the "html.parser" and prints back none if I change…

asked Sep 07 '14 at 20:23

LaGuille

1,658
5
20
37

votes

1 answer

HTML scraping using lxml and requests gives a unicode error

I'm trying to use HTML scraper like the one provided here. It works fine for the example they provided. However, when I try using it with my webpage, I receive this error - Unicode strings with encoding declaration are not supported. Please use…

python html unicode web-scraping lxml

asked Jul 29 '14 at 19:15

user3783999

votes

1 answer

How can I preserve
as newlines with lxml.html text_content() or equivalent?

I want to preserve
tags as \n when extracting the text content from lxml elements. Example code: fragment = '

This is a text node.
This is another text node.

And a child element.Another child,
with two…

python lxml lxml.html

asked Sep 06 '13 at 14:39

extempo

votes

7 answers

How can I parse HTML with html5lib, and query the parsed HTML with XPath?

I am trying to use html5lib to parse an html page in to something I can query with xpath. html5lib has close to zero documentation and I've spent too much time trying to figure this problem out. Ultimate goal is to pull out the second row of a…

python parsing xpath lxml html5lib

asked Apr 01 '10 at 04:04

Dan.StackOverflow

1,279
4
18
28

votes

3 answers

Parsing broken XML with lxml.etree.iterparse

I'm trying to parse a huge xml file with lxml in a memory efficient manner (ie streaming lazily from disk instead of loading the whole file in memory). Unfortunately, the file contains some bad ascii characters that break the default parser. The…

python xml sax lxml

asked Feb 28 '10 at 21:55

erikcw

10,787
15
58
75

votes

3 answers

Creating a doctype with lxml's etree

I want to add doctypes to my XML documents that I'm generating with LXML's etree. However I cannot figure out how to add a doctype. Hardcoding and concating the string is not an option. I was expecting something along the lines of how PI's are…

python doctype lxml elementtree

asked Jun 14 '09 at 00:41

Marijn

votes

3 answers

using lxml and iterparse() to parse a big (+- 1Gb) XML file

I have to parse a 1Gb XML file with a structure such as below and extract the text within the tags "Author" and "Content": MM/DD/YY Last Name, Name Lorem ipsum…

python xml parsing lxml iterparse

asked Mar 24 '12 at 22:25

mvime

votes

2 answers

How to find XML Elements via XPath in Python in a namespace-agnostic way?

since I had this annoying issue for the 2nd time, I thought that asking would help. Sometimes I have to get Elements from XML documents, but the ways to do this are awkward. I’d like to know a python library that does what I want, a elegant way to…

python xml xpath lxml elementtree

asked Apr 06 '11 at 19:57

flying sheep

8,475
5
56
73

votes

8 answers

lxml.etree, element.text doesn't return the entire text from an element

I scrapped some html via xpath, that I then converted into an etree. Something similar to this: text1 link text2 but when I call element.text, I only get text1 (It must be there, when I check my query in FireBug, the text of…

python xml lxml elementtree xml.etree

asked Jan 22 '11 at 19:56

user522034

votes

2 answers

Python lxml Subelement with text value?

Is it possible to somehow create element with default text value? So I would not need to do it like this? from lxml import etree root = etree.Element('root') a = etree.SubElement(root, 'a') a.text = 'some text' # Avoid this extra step? I mean you…

python lxml

asked Oct 28 '15 at 09:12

Andrius

19,658
37
143
243

votes

2 answers

How to add a namespace to an attribute in lxml

I'm trying to create an xml entry that looks like this using python and lxml: I'm using python and lxml. I'm having trouble with the adlcp:scormtype attribute. I'm new to xml so please correct…

python xml lxml scorm

asked Sep 03 '09 at 16:17

Mateo

1,781
1
16
21

votes

1 answer

Parse SGML with Open Arbitrary Tags in Python 3

I am trying to parse a file such as: http://www.sec.gov/Archives/edgar/data/1409896/000118143112051484/0001181431-12-051484.hdr.sgml I am using Python 3 and have been unable to find a solution with existing libraries to parse an SGML file with open…

python xml python-3.x lxml sgml

asked Sep 20 '12 at 02:39

borncamp

votes

2 answers

How to write namespaced element attributes with LXML?

I'm using lxml (2.2.8) to create and write out some XML (specifically XGMML). The app which will be reading it is apparently fairly fussy and wants to see a top level element with:

python lxml xml-namespaces cytoscape

asked Oct 09 '11 at 10:56

timday

24,582
12
83
135

votes

1 answer

Extracting lxml xpath for html table

I have a html doc similar to following:

python html xpath html-table lxml

asked Apr 07 '11 at 19:10

mkt2012

votes

1 answer

Pylint Error Message: "E1101: Module 'lxml.etree' has no 'strip_tags' member'"

I am experimenting with lxml and python for the first time for a personal project, and I am attempting to strip tags from a bit of source code using etree.strip_tags(). For some reason, I keep getting the error message: "E1101: Module 'lxml.etree'…

python lxml elementtree pylint

asked Apr 07 '17 at 14:20

Aaron Viscichini

Prev 1 2 3

…

99 100 Next

Code	Name