Should I use PyXML or what's in the standard library?
Asked
Active
Viewed 1.0k times
3 Answers
10
ElementTree is provided as part of the standard Python libs. ElementTree is pure python, and cElementTree is the faster C implementation:
# Try to use the C implementation first, falling back to python
try:
from xml.etree import cElementTree as ElementTree
except ImportError, e:
from xml.etree import ElementTree
Here's an example usage, where I'm consuming xml from a RESTful web service:
def find(*args, **kwargs):
"""Find a book in the collection specified"""
search_args = [('access_key', api_key),]
if not is_valid_collection(kwargs['collection']):
return None
kwargs.pop('collection')
for key in kwargs:
# Only the first keword is honored
if kwargs[key]:
search_args.append(('index1', key))
search_args.append(('value1', kwargs[key]))
break
url = urllib.basejoin(api_url, '%s.xml' % 'books')
data = urllib.urlencode(search_args)
req = urllib2.urlopen(url, data)
rdata = []
chunk = 'xx'
while chunk:
chunk = req.read()
if chunk:
rdata.append(chunk)
tree = ElementTree.fromstring(''.join(rdata))
results = []
for i, elem in enumerate(tree.getiterator('BookData')):
results.append(
{'isbn': elem.get('isbn'),
'isbn13': elem.get('isbn13'),
'title': elem.find('Title').text,
'author': elem.find('AuthorsText').text,
'publisher': elem.find('PublisherText').text,}
)
return results

vezult
- 5,185
- 25
- 41
-
vezult, how come sometimes you use elem.get() and sometimes you use elem.find().text? – rick May 07 '09 at 00:52
-
@rick: elem.get() is fetching the value of an element attribute, while elem.find() is searching for elements contained within the elem element. – vezult May 08 '09 at 02:49
-
`tree = ElementTree.parse(urllib2.urlopen(url, data))` should work without `rdata` list. – jfs Feb 27 '10 at 22:51
3
I always prefer to use the standard library when possible. ElementTree is well known amongst pythonistas, so you should be able to find plenty of examples. Parts of it have also been optimized in C, so it's quite fast.

Justin
- 155
- 1
- 6
0
There's also BeautifulSoup, which has an API some might prefer. Here's an example on how you can extract all tweets that have been favorited from Twitter's Public Timeline:
from BeautifulSoup import BeautifulStoneSoup
import urllib
url = urllib.urlopen('http://twitter.com/statuses/public_timeline.xml').read()
favorited = []
soup = BeautifulStoneSoup(url)
statuses = soup.findAll('status')
for status in statuses:
if status.find('favorited').contents != [u'false']:
favorited.append(status)

Henrik Lied
- 135
- 2
- 8
-
Alas, BeautifulSoup is no longer maintained. I would avoid it, and lean towards lxml or ElementTree. – mlissner Apr 09 '13 at 20:27
-
@mlissner I can't see where does it says on BS4 website that it is no longer maintained. Is that really the case? – mrkzq Jun 28 '16 at 08:46
-
At one point the maintainer was threatening to step down, but it seems that reality never came to pass. – mlissner Jun 28 '16 at 13:41