0

I have the following xml file

    <?xml version="1.0" encoding="UTF-8"?>

<record xmlns:mxc="info:lc/xmlns/marcxchange-v2" xmlns:srw="http://www.loc.gov/zing/srw/" xmlns="http://catalogue.bnf.fr/namespaces/InterXMarc" xmlns:ixm="http://catalogue.bnf.fr/namespaces/InterXMarc" xmlns:mn="http://catalogue.bnf.fr/namespaces/motsnotices" xmlns:sd="http://www.loc.gov/zing/srw/diagnostic/" format="INTERMARC" id="ark:/12148/cb14816776x" type="Authority">
   <leader>00987c0 ap2200027   45  </leader>   
<controlfield tag="001">FRBNF148167768</controlfield>   
<datafield tag="031" ind1=" " ind2=" ">
      <subfield code="a">0000000081282943</subfield>    
      <subfield code="d">20130802</subfield>
   </datafield>   
</record> 

As you may notice, namespaces are declared but not used.

I wrote the following python script :

from lxlm import etree
with open(path, "r", encoding='utf8') as file_bib :
            data = file_bib.read().encode()
            dataxml = etree.XML(data)

#id
            recordId = dataxml.xpath(".//controlfield[@tag='001']/text()")[0]
            print (recordId)  

When I launch my script, I have the followin error :

recordId = dataxml.xpath(".//controlfield[@tag='001']/text()")[0] IndexError: list index out of range

When I remove all the namespaces declaration from the tag, the parsing works Ok. I guess I could remove the namespaces programmaticaly,but I'd prefer to understand why my script doesn't work in the first place.

Thanks !

  • The default namespace is used: `xmlns="http://catalogue.bnf.fr/namespaces/InterXMarc"` – mzjn Sep 09 '21 at 06:43
  • Ok, but how do I know it's the default namespace, and how do I declare it ?
    recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0] doesn't work
    – Catalaburro Sep 09 '21 at 06:54
  • 1
    A declaration of a default namespace in XML does not define a prefix, like the others do. But in the Python code, you still have to define a prefix:uri mapping and use it in the call to `xpath()`. But it seems you know that already. And please avoid code in comments. Edit the question instead. – mzjn Sep 09 '21 at 06:57
  • Here is a similar question: https://stackoverflow.com/q/8053568/407651 – mzjn Sep 09 '21 at 07:03
  • Yeah, and I must have made a typo or something because "recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0]" does work. Thanks for your help – Catalaburro Sep 09 '21 at 07:17

1 Answers1

0

So thanks to the comments, here is the right way to deal with the case when there is a default namespace:

with open(path, "r", encoding='utf8') as file_bib :
            
            data = file_bib.read().encode()
            dataxml = etree.XML(data)
            print (dataxml)

#id
            recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0]
            print (recordId)
  • Please provide additional details in your answer. As it's currently written, it's hard to understand your solution. – Community Sep 09 '21 at 07:22
  • Note that the other namespace declarations (with prefixes) are not "messing" with your parsing. They have no effect at all. – mzjn Sep 09 '21 at 07:30