5

I am connecting to isbndb.com for book information and their response looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<ISBNdb server_time="2005-02-25T23:03:41">
 <BookList total_results="1" page_size="10" page_number="1" shown_results="1">
  <BookData book_id="somebook" isbn="0123456789">
   <Title>Interesting Book</Title>
   <TitleLong>Interesting Book: Read it or else..</TitleLong>
   <AuthorsText>John Doe</AuthorsText>
   <PublisherText>Acme Publishing</PublisherText>
  </BookData>
 </BookList>
</ISBNdb>

What is the best way to turn this data into an object using appengine (Python)?

I need the isbn number (a tag in BookData) but I also need the contents (as opposed to tags) of all the children of BookData.

jcuenod
  • 55,835
  • 14
  • 65
  • 102

2 Answers2

7

use etree:)

>>> xml = """<?xml version="1.0" encoding="UTF-8"?>
... <ISBNdb server_time="2005-02-25T23:03:41">
...  <BookList total_results="1" page_size="10" page_number="1" shown_results="1">
...   <BookData book_id="somebook" isbn="0123456789">
...    <Title>Interesting Book</Title>
...    <TitleLong>Interesting Book: Read it or else..</TitleLong>
...    <AuthorsText>John Doe</AuthorsText>
...    <PublisherText>Acme Publishing</PublisherText>
...   </BookData>
...  </BookList>
... </ISBNdb>"""

from xml.etree import ElementTree as etree
tree = etree.fromstring(xml)

>>> for book in tree.iterfind('BookList/BookData'):
...     print 'isbn:', book.attrib['isbn']
...     for child in book.getchildren():
...             print '%s :' % child.tag, child.text
... 
isbn: 0123456789
Title : Interesting Book
TitleLong : Interesting Book: Read it or else..
AuthorsText : John Doe
PublisherText : Acme Publishing
>>> 

voila;)
virhilo
  • 6,568
  • 2
  • 29
  • 26
  • I am actually trying to turn the xml data into a an object like Book.isbn and Book.title etc. but I'll accept as I think I was unclear and it seems this is likely the closest I will get. I'm just going to shove a switch-case (using if-else) into the for loop and produce an object like that though so if you have a better idea please share. – jcuenod Jan 07 '11 at 18:54
  • you can do it like: class Book(object): def __init__(self, isbn, title, title_long): self.isbn = isbn self.title = title self.title_long = title_long # etc. books = [] for book in tree.iterfind('BookList/BookData'): book_obj = Book(book.attrib['isbn'], book.find('Title'), book.find('TitleLong')) #etc. books.add(book_obj) – virhilo Jan 07 '11 at 19:14
  • That's brilliant thanks - by the way your above code needed a fair bit of modification I'm not sure whether my appengine is old but iterfind is only in Python 2.7 and wasn't there and when I used .find it iterated through the children without needing to getchildren. But thanks very much – jcuenod Jan 07 '11 at 19:40
  • 3
    Yes, the `iterfind()` method was added in Python 2.7. GAE [supports Python 2.5](http://code.google.com/intl/sv/appengine/kb/general.html#language). To make @virhilo's solution work in 2.5, replace `iterfind()` with `findall()`. – mzjn Jan 07 '11 at 20:22
  • I'm using find() because the api will return at most one value when searching by isbn (as it's unique). Thanks – jcuenod Jan 08 '11 at 10:47
0

There is an excellent Python module called BeautifulSoup. Use the BeautifulStoneSoup class for XML parsing.

More info: http://www.crummy.com/software/BeautifulSoup/documentation.html

infrared
  • 3,566
  • 2
  • 25
  • 37