lxml xpath returns an empty list

Question

<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" class="pc chrome win psc_dir-ltr psc_form-xlarge" dir="ltr" lang="en">
<title>Some Title</title>
</html>

if I run:

from lxml import etree
html = etree.parse('text.txt')
result = html.xpath('//title')
print(result)

I will get an empty list. I guess it has something to do with namespace, but I can't figure it out how to fix it.

Are you using the xml or html tree builder? http://lxml.de/parsing.html — James Schinner, Jul 25 '17 at 05:37

James Schinner · Accepted Answer · 2017-07-25T06:03:52.860

Try creating the tree using the html parser. Also note that if text.txt is a file it will need to be read first.

with open('text.txt', 'r', encoding='utf8') as f:
    text_html = f.read()

like this:

from lxml import etree, html

def build_lxml_tree(_html):
    tree = html.fromstring(_html)
    tree = etree.ElementTree(tree)
    return tree

tree = build_lxml_tree(text_html)
result = tree.xpath('//title')
print(result)

score 1 · Answer 2 · answered Jul 25 '17 at 05:57

1

You can also use the HTML parser :

from lxml import etree
parser = etree.HTMLParser() 
html = etree.parse('text.txt',parser)
result = html.xpath('//title')
print(result)

answered Jul 25 '17 at 05:57

PRMoureu

12,817
6
38
48

score 1 · Answer 3 · answered Jul 25 '17 at 06:09

1

Your can do like this:

from lxml import etree
parser = etree.HTMLParser() 
html = etree.parse('text.txt',parser)
result = html.xpath('//title/text()')
print(result)

The output is:

['Some Title']

answered Jul 25 '17 at 06:09

youDaily

1,372
13
21

dicristina · Answer 4 · 2021-07-08T18:00:27.893

You can use the namespaces parameter of the xpath method like this:

from lxml import etree
html = etree.parse('text.txt')
result = html.xpath('//n:title', namespaces = {'n': 'http://www.w3.org/1999/xhtml'})

According to the lxml documentation "[...] XPath does not have a notion of a default namespace. The empty prefix is therefore undefined for XPath and cannot be used in namespace prefix mappings", so if you are working with an element that has a default namespace you can explicitly define the namespace when calling xpath.

For more information see this similar question with a great answer.

lxml xpath returns an empty list

4 Answers4