How do I use a default namespace in an lxml xpath query?

Question

I have an xml document in the following format:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007">
  ...
  <entry>
    <id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id>
    <updated>2011-11-07T21:32:39.795Z</updated>
    <app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited>
    <link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>
    <link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>
    <gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content>
    <gsa:content name="numCrawledURLs">7</gsa:content>
    <gsa:content name="numExcludedURLs">0</gsa:content>
    <gsa:content name="type">DirectoryContentData</gsa:content>
    <gsa:content name="numRetrievalErrors">0</gsa:content>
  </entry>
  <entry>
    ...
  </entry>
  ...
</feed>

I need to retrieve all entry elements using xpath in lxml. My problem is that I can't figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise.

import lxml.etree as et

tree=et.fromstring(xml)

The various things I have tried are:

for node in tree.xpath('//entry'):

or

namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"}

for node in tree.xpath('//entry', namespaces=ns):

or

for node in tree.xpath('//\"{http://www.w3.org/2005/Atom}entry\"'):

At this point I just don't know what to try. Any help is greatly appreciated.

It really feels weird that it won't let you find tags in the default namespace that wasn't not assigned a name. I couldn't believe my eyes when I've encountered this. — Ivan, Jun 23 '19 at 22:43

mzjn · Accepted Answer · 2011-11-08T20:19:19.297

50

Something like this should work:

import lxml.etree as et

ns = {"atom": "http://www.w3.org/2005/Atom"}
tree = et.fromstring(xml)
for node in tree.xpath('//atom:entry', namespaces=ns):
    print node

See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.

Alternative:

for node in tree.xpath("//*[local-name() = 'entry']"):
    print node

edited Nov 08 '11 at 20:19

answered Nov 08 '11 at 19:49

mzjn

48,958
13
128
248

9

so there is no way to use a default namespace here? I ask because it makes it easier to use the actual tag as it appears in the document, which is ``, rather than `` – ewok Nov 08 '11 at 19:59
1

It's important to note that `tree.xpath("atom:entry")` doesn't work while in the non-namespaced document `tree.xpath("atom:entry")` does work. You need the `//` as in `tree.xpath("//atom:entry")`. – CodeMonkey Jul 01 '16 at 12:47
2

The `local-name` tip is a good one, for finding non-namespaced elements among namespaced ones. – ghukill Nov 20 '18 at 13:36

score 2 · Answer 2 · answered Nov 08 '11 at 16:24

2

Use findall method.

for item in tree.findall('{http://www.w3.org/2005/Atom}entry'): 
    print item

answered Nov 08 '11 at 16:24

Seb

17,141
7
38
27

5

This is a useful work around, but is it possible to use namespaces in an actual xpath expression, using `tree.xpath()` – ewok Nov 08 '11 at 17:51

How do I use a default namespace in an lxml xpath query?

2 Answers2

Linked

Related