How to use XPath in lxml python module

Question

I have a xml file as below

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
  <loc>https://ezinearticles.com/</loc>
  <changefreq>hourly</changefreq>
  <priority>1.0</priority>
 </url>
 <url>
  <loc>https://ezinearticles.com/submit/</loc>
  <changefreq>weekly</changefreq>
  <priority>0.3</priority>
 </url>
 ...................

I want to use xpathin lxml module to get URL from all tag. I implemented it as below code but it didn't work. The result is empty list

from lxml import etree
parser = etree.XMLParser(ns_clean=True)
xmlfile = "sitemap1.xml"
xmlobj = etree.parse(xmlfile, parser)

loc = xmlobj.xpath('//loc[text()]')

print(loc)

Can anyone help me fix my script ?

...granted, this question isn't about XHTML, but it's the exact same problem (just with two different namespaces). — Charles Duffy, Jul 05 '16 at 17:54

score 1 · Answer 1 · answered Jul 05 '16 at 17:36

1

# define a namespace map
nsmap={'s': 'http://www.sitemaps.org/schemas/sitemap/0.9'}

# use it in your query
loc = xmlobj.xpath('//s:loc[text()]', namespaces=nsmap)

In your original code, you were looking for a loc (in the default namespace), but the element is actually a {http://www.sitemaps.org/schemas/sitemap/0.9}loc (because the xmlns= means that everything below it uses that namespace by default), which is why the original query didn't match.

answered Jul 05 '16 at 17:36

Charles Duffy

280,126
43
390
441

Try to get loc with "priority = 1" by code : loc = xmlobj.xpath('//s:url[priority=1]/loc/text()', namespaces=nsmap), but get empty string, do you know why ? – Le Truong Sinh Jul 06 '16 at 16:08
`//s:url[s:priority=1]/s:loc/text()`, assuming that everything but the namespaces is right. – Charles Duffy Jul 06 '16 at 16:41

How to use XPath in lxml python module

1 Answers1