0

Very new with elementtree so i'm trying to parse xml file for tv addon for xbmc. Below is the code that i'm having issue with. I think my xpath is not correct and placeholder is not working on the the attribute!

This is the xml file i'm workig with - http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930

    seasonnum = root2.findall("/Show/Episodelist/Season[@no='%s']/episode/seasonnum" % (season))


        import xml.etree.ElementTree as ET
        import urllib            
        tree2 = ET.parse(urllib.urlopen(url))
        root2 = tree2.getroot()
        seasonnum = tree2.findall("./Episodelist/Season[@no='%s']/episode/seasonnum" % '1')
        print seasonnum

SyntaxError: expected path separator ([) is what i get

Mikewave
  • 23
  • 3

4 Answers4

2

using ElementTree:

>>> from xml.etree import ElementTree
>>> import urllib2
>>> url = 'http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930'
>>> request = urllib2.Request(url, headers={"Accept" : "application/xml"})
>>> u = urllib2.urlopen(request)
>>> tree = ElementTree.parse(u)
>>> rootElem = tree.getroot()
>>> [s.text for s in rootElem.findall('.//Season[@no="2"]/episode/seasonnum')]
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', 
 '15', '16', '17', '18', '19', '20', '21', '22']
Guy Gavriely
  • 11,228
  • 6
  • 27
  • 42
1

According to xml.etree.ElementTree documentation - XPath support:

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

You may need third-part library like lxml to use XPath.

Example:

>>> import lxml.etree
>>>
>>> url = 'http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930'
>>> tree = lxml.etree.parse()
>>> tree.xpath("/Show/Episodelist/Season[@no='%s']/episode/seasonnum/text()" % 1)
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']

UPDATE

To use lxml.etree.ElementTree, the xpath should be slightly modified:

>>> import urllib
>>> import xml.etree.ElementTree as ET
>>>
>>> f = urllib.urlopen(url)
>>> tree = ET.parse(f)
>>> [e.text for e in tree.findall("./Episodelist/Season[@no='%s']/episode/seasonnum" % 1)]
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Is there a way to use elementtree as i'm using it for xbmc and that module is already installed – Mikewave Feb 20 '14 at 11:45
  • @Mikewave, I added an alternative that use `lxml.etree.ElementTree`. – falsetru Feb 20 '14 at 11:53
  • I've tried that and getting syntaxError: expected path separator ([) i put the code that i use above – Mikewave Feb 20 '14 at 12:10
  • @Mikewave, Indent the code correctly. And please post full traceback if possible. – falsetru Feb 20 '14 at 12:20
  • @Mikewave, The code works fine. See [a screecast I just recorded](http://asciinema.org/a/7761). The code is slightly modified (`url` added, `root2 = ..` line remove because `root2` is not used). – falsetru Feb 20 '14 at 12:24
0

I have tried your example and it works. Here is a condensed, complete version:

import urllib
import xml.etree.ElementTree as ET

url = 'http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930'
tree = ET.parse(urllib.urlopen(url))
seasons = tree.findall("./Episodelist/Season[@no='%s']/episode/seasonnum" % '1')

for s in seasons:
    print s.text

The only problem I can think of, is somehow, you downloaded a partial XML document--unlikely, but I do not know any other explanations. Note that the above script is taken from your question. I only added the for loop.

Hai Vu
  • 37,849
  • 11
  • 66
  • 93
0
    import xml.etree.ElementTree as ET
    import urllib
    content = urllib.urlopen(url).read()
    tree2 = ET.fromstring(content)
    tvrage_seasons = tree2.findall('.//Season' )

Had to work it like this as for some reason in xbmc Elementtree there must be an error or something to not make it work. But this worked out for me!

Mikewave
  • 23
  • 3