2

I'm trying to parse this XML. It's a YouTube feed. I'm working based on code in the tutorial. I want to get all the entry nodes that are nested under the feed.

from lxml import etree
root = etree.fromstring(text)
entries = root.xpath("/feed/entry")
print entries

For some reason entries is an empty list. Why?

mpenkov
  • 21,621
  • 10
  • 84
  • 126

2 Answers2

4

feed and all its children are actually in the http://www.w3.org/2005/Atom namespace. You need to tell your xpath that:

entries = root.xpath("/atom:feed/atom:entry", 
                     namespaces={'atom': 'http://www.w3.org/2005/Atom'})

or, if you want to change the default empty namespace:

entries = root.xpath("/feed/entry", 
                     namespaces={None: 'http://www.w3.org/2005/Atom'})

or, if you don't want to use shorthandles at all:

entries = root.xpath("/{http://www.w3.org/2005/Atom}feed/{http://www.w3.org/2005/Atom}entry")

To my knowledge the "local namespace" is implicitly assumed for the node you're working with so that operations on children in the same namespace do not require you to set it again. So you should be able to do something along the lines of:

feed = root.find("/atom:feed",
                     namespaces={'atom': 'http://www.w3.org/2005/Atom'})

title = feed.xpath("title")
entries = feed.xpath("entries")
# etc...
Nils Werner
  • 34,832
  • 7
  • 76
  • 98
  • i think you could do it only if your are the author of this XML file to drop this namespace –  Aug 21 '13 at 11:28
  • You should not "drop the namespace" as there is a reason why Atom feeds are using it. I've added a few more examples that could make your life easier. – Nils Werner Aug 21 '13 at 11:36
  • Some XPATH versions allow specifying "*" for any namespace if I recall correctly? – BartoszKP Aug 21 '13 at 12:04
  • 1
    You can use `*[local-name()='feed']` to match an element `feed` of any namespace. That is considered to be bad practice though. – Nils Werner Aug 21 '13 at 12:26
  • 1
    @misha Is there any way to avoid specifying the prefix? Yes, use XPath 2.0. But that's not easy from Python. – Michael Kay Aug 21 '13 at 18:20
  • @NilsWerner Thank you for your helpful suggestions. Unfortunately, you can't use `None` for a prefix - lxml specifically prohibits that and raises an exception. This means that you cannot modify the default namespace. Also, I tried checking the implicit assumption of the local namespace, but it doesn't work as you describe. Unless I specify the namespace of each part of the xpath query explicitly, the search returns an empty list. – mpenkov Aug 22 '13 at 05:42
1

It's because of the namespace in the XML. Here is an explanation: http://www.edankert.com/defaultnamespaces.html#Conclusion.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130