I have the following xml file
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns:mxc="info:lc/xmlns/marcxchange-v2" xmlns:srw="http://www.loc.gov/zing/srw/" xmlns="http://catalogue.bnf.fr/namespaces/InterXMarc" xmlns:ixm="http://catalogue.bnf.fr/namespaces/InterXMarc" xmlns:mn="http://catalogue.bnf.fr/namespaces/motsnotices" xmlns:sd="http://www.loc.gov/zing/srw/diagnostic/" format="INTERMARC" id="ark:/12148/cb14816776x" type="Authority">
<leader>00987c0 ap2200027 45 </leader>
<controlfield tag="001">FRBNF148167768</controlfield>
<datafield tag="031" ind1=" " ind2=" ">
<subfield code="a">0000000081282943</subfield>
<subfield code="d">20130802</subfield>
</datafield>
</record>
As you may notice, namespaces are declared but not used.
I wrote the following python script :
from lxlm import etree
with open(path, "r", encoding='utf8') as file_bib :
data = file_bib.read().encode()
dataxml = etree.XML(data)
#id
recordId = dataxml.xpath(".//controlfield[@tag='001']/text()")[0]
print (recordId)
When I launch my script, I have the followin error :
recordId = dataxml.xpath(".//controlfield[@tag='001']/text()")[0] IndexError: list index out of range
When I remove all the namespaces declaration from the tag, the parsing works Ok. I guess I could remove the namespaces programmaticaly,but I'd prefer to understand why my script doesn't work in the first place.
Thanks !
recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0] doesn't work – Catalaburro Sep 09 '21 at 06:54