It's a good question, with not an easy to find answer.
The main difference is that local-name()
does not consider prefixes (namespaces) for tags.
For example, given a node <x:html xmlns:x="http://www.w3.org/1999/xhtml"/>
, the local-name
will match the html
tag, while //html
will not work, and neither will //x:html
.
Please consider the following code, if you have any questions feel free to ask.
Show me the code
Setup:
from lxml.etree import fromstring
tree = fromstring('<x:html xmlns:x="http://www.w3.org/1999/xhtml"/>')
It is now not possible to use the tag selector:
tree.xpath('//html')
# []
tree.xpath('//x:html')
# XPathEvalError: Undefined namespace prefix
But using local-name
we can still get the element (considering the namespace)
tree.xpath('//*[local-name() = "html"]')
# [<Element {http://www.w3.org/1999/xhtml}html at 0x103b8d848>]
Or strict namespace using name()
:
tree.xpath('//*[name() = "x:html"]')
# [<Element {http://www.w3.org/1999/xhtml}html at 0x103b8d848>]
Performance
I parsed this website as a tree and used the following queries:
%timeit tree.xpath('//*[local-name() = "div"]')
# 1000 loops, best of 3: 570 µs per loop
%timeit tree.xpath('//div')
# 10000 loops, best of 3: 44.4 µs per loop
Now onto actual namespaces. I parsed a block from here.
example = """ ... """
from lxml.etree import fromstring
tree = fromstring(example)
%timeit tree.xpath('//hr:author',
namespaces = {'hr' : 'http://eric.van-der-vlist.com/ns/person'})
# 100000 loops, best of 3: 18.2 µs per loop
%timeit tree.xpath('//*[local-name() = "author"]')
# 10000 loops, best of 3: 37.7 µs per loop
Conclusion
I had to rewrite to conclusion since after using the namespace method it became obvious that the gain when using namespaces is also there. Roughly 2 times faster when specifying the namespace (causing optimizations), rather than using local-name
.