python xml xpath query using tag and attribute with ns

Question

I must be doing something inherently wrong here, every example I've seen and search for on SO seems to suggest this would work.

I'm trying to use an XPath search with lxml etree library to parse a garmin tcx file:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">

  <Workouts>
    <Workout Sport="Biking">
      <Name>3P2 WK16 - 3</Name>
      <Step xsi:type="Step_t">
        <StepId>1</StepId>
        <Name>[MP19]6:28-6:38</Name>
        <Duration xsi:type="Distance_t">
          <Meters>13000</Meters>
        </Duration>
        <Intensity>Active</Intensity>
        <Target xsi:type="Speed_t">
          <SpeedZone xsi:type="PredefinedSpeedZone_t">
            <Number>2</Number>
          </SpeedZone>
        </Target>
      </Step>
     ......
     </Workout>
</Workouts>
</TrainingCenterDatabase>

I'd like to return the SpeedZone Element only where the type is PredefinedSpeedZone_t. I thought I'd be able to do:

root = ET.parse(open('file.tcx'))
xsi = {'xsi': 'http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2'}
    for speed_zone in root.xpath(".//xsi:SpeedZone[@xsi:type='PredefinedSpeedZone_t']", namespaces=xsi):
        print speed_zone

Though this doesn't seem to be the case. I've tried lots of combinations of removing/adding namespaces and to no avail. If I remove the attribute search and leave it as ".//xsi:SpeedZone" then this does return:

<Element {http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}SpeedZone at 0x2595188>

as I'd expect.

I guess I could do it inside the for loop but it just feels like it should be possible on one line!

score 4 · Accepted Answer · answered Jul 29 '15 at 15:30

I'm a bit late, but the other answers are confusing IMHO.

In the Python code in the question and in the two other answers, the xsi prefix is bound to the http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 URI. But in the XML document with the Garmin data, xsi is bound to http://www.w3.org/2001/XMLSchema-instance.

Since there are two namespaces at play here, I think the following code gives a clearer picture of what's going on. The namespace associated with the tcd prefix is the default namespace.

from lxml import etree

NSMAP = {"tcd": "http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2",
         "xsi": "http://www.w3.org/2001/XMLSchema-instance"}

root = etree.parse('file.tcx')

for speed_zone in root.xpath(".//tcd:SpeedZone[@xsi:type='PredefinedSpeedZone_t']",
                             namespaces=NSMAP):
    print speed_zone

Output:

<Element {http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}SpeedZone at 0x25b7e18>

I still had to re-read your answer 3 or 4 times but once it clicked, it looked so obvious. Not sure why I couldn't spot that sooner. think this deserves the correct answer, so edited it. — kikixx, Jul 31 '15 at 15:18

alecxe · Answer 2 · 2014-03-17T20:31:32.593

1

One way to workaround this is to avoid specifying the attribute name and use *:

.//xsi:SpeedZone[@*='PredefinedSpeedZone_t']

Another option (not that awesome as previous one) is to actually get all the SpeedZone tags and check for the attribute value in the loop:

attribute_name = '{%s}type' % root.nsmap['xsi']
for speed_zone in root.xpath(".//xsi:SpeedZone", namespaces=xsi):
    if speed_zone.attrib.get(attribute_name) == 'PredefinedSpeedZone_t':
        print speed_zone

Hope that helps.

edited Mar 17 '14 at 20:31

answered Mar 17 '14 at 20:24

alecxe

462,703
120
1,088
1,195

Thanks, the wildcard attribute did the job and gives my OCD what it needs for the one liner :) I guess it's something odd between the XPath and etree support then. – kikixx Mar 17 '14 at 23:07

score 1 · Answer 3 · answered Mar 17 '14 at 21:16

1

If all else fails you can still use

".//xsi:SpeedZone[@*[name() = 'xsi:type' and . = 'PredefinedSpeedZone_t']]"

Using name() is not as nice as directly addressing the namespaced attribute, but at least etree understands it.

answered Mar 17 '14 at 21:16

Tomalak

332,285
67
532
628

python xml xpath query using tag and attribute with ns

3 Answers3