Issue with xpath for python

Question

Very new with elementtree so i'm trying to parse xml file for tv addon for xbmc. Below is the code that i'm having issue with. I think my xpath is not correct and placeholder is not working on the the attribute!

This is the xml file i'm workig with - http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930

    seasonnum = root2.findall("/Show/Episodelist/Season[@no='%s']/episode/seasonnum" % (season))


        import xml.etree.ElementTree as ET
        import urllib            
        tree2 = ET.parse(urllib.urlopen(url))
        root2 = tree2.getroot()
        seasonnum = tree2.findall("./Episodelist/Season[@no='%s']/episode/seasonnum" % '1')
        print seasonnum

SyntaxError: expected path separator ([) is what i get

score 2 · Answer 1 · answered Feb 20 '14 at 04:12

using ElementTree:

>>> from xml.etree import ElementTree
>>> import urllib2
>>> url = 'http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930'
>>> request = urllib2.Request(url, headers={"Accept" : "application/xml"})
>>> u = urllib2.urlopen(request)
>>> tree = ElementTree.parse(u)
>>> rootElem = tree.getroot()
>>> [s.text for s in rootElem.findall('.//Season[@no="2"]/episode/seasonnum')]
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', 
 '15', '16', '17', '18', '19', '20', '21', '22']

falsetru · Answer 2 · 2014-02-20T11:53:12.617

1

According to xml.etree.ElementTree documentation - XPath support:

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

You may need third-part library like lxml to use XPath.

Example:

>>> import lxml.etree
>>>
>>> url = 'http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930'
>>> tree = lxml.etree.parse()
>>> tree.xpath("/Show/Episodelist/Season[@no='%s']/episode/seasonnum/text()" % 1)
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']

UPDATE

To use lxml.etree.ElementTree, the xpath should be slightly modified:

>>> import urllib
>>> import xml.etree.ElementTree as ET
>>>
>>> f = urllib.urlopen(url)
>>> tree = ET.parse(f)
>>> [e.text for e in tree.findall("./Episodelist/Season[@no='%s']/episode/seasonnum" % 1)]
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']

edited Feb 20 '14 at 11:53

answered Feb 20 '14 at 04:03

falsetru

357,413
63
732
636

Is there a way to use elementtree as i'm using it for xbmc and that module is already installed – Mikewave Feb 20 '14 at 11:45
@Mikewave, I added an alternative that use `lxml.etree.ElementTree`. – falsetru Feb 20 '14 at 11:53
I've tried that and getting syntaxError: expected path separator ([) i put the code that i use above – Mikewave Feb 20 '14 at 12:10
@Mikewave, Indent the code correctly. And please post full traceback if possible. – falsetru Feb 20 '14 at 12:20
@Mikewave, The code works fine. See [a screecast I just recorded](http://asciinema.org/a/7761). The code is slightly modified (`url` added, `root2 = ..` line remove because `root2` is not used). – falsetru Feb 20 '14 at 12:24

score 0 · Answer 3 · answered Feb 20 '14 at 16:11

I have tried your example and it works. Here is a condensed, complete version:

import urllib
import xml.etree.ElementTree as ET

url = 'http://services.tvrage.com/myfeeds/episode_list.php?key=ag6txjP0RH4m0c8sZk2j&sid=2930'
tree = ET.parse(urllib.urlopen(url))
seasons = tree.findall("./Episodelist/Season[@no='%s']/episode/seasonnum" % '1')

for s in seasons:
    print s.text

The only problem I can think of, is somehow, you downloaded a partial XML document--unlikely, but I do not know any other explanations. Note that the above script is taken from your question. I only added the for loop.

score 0 · Accepted Answer · answered Feb 20 '14 at 19:36

    import xml.etree.ElementTree as ET
    import urllib
    content = urllib.urlopen(url).read()
    tree2 = ET.fromstring(content)
    tvrage_seasons = tree2.findall('.//Season' )

Had to work it like this as for some reason in xbmc Elementtree there must be an error or something to not make it work. But this worked out for me!

Issue with xpath for python

4 Answers4