Declared but non used namespaces messing with my parsing

Question

I have the following xml file

    <?xml version="1.0" encoding="UTF-8"?>

<record xmlns:mxc="info:lc/xmlns/marcxchange-v2" xmlns:srw="http://www.loc.gov/zing/srw/" xmlns="http://catalogue.bnf.fr/namespaces/InterXMarc" xmlns:ixm="http://catalogue.bnf.fr/namespaces/InterXMarc" xmlns:mn="http://catalogue.bnf.fr/namespaces/motsnotices" xmlns:sd="http://www.loc.gov/zing/srw/diagnostic/" format="INTERMARC" id="ark:/12148/cb14816776x" type="Authority">
   <leader>00987c0 ap2200027   45  </leader>   
<controlfield tag="001">FRBNF148167768</controlfield>   
<datafield tag="031" ind1=" " ind2=" ">
      <subfield code="a">0000000081282943</subfield>    
      <subfield code="d">20130802</subfield>
   </datafield>   
</record>

As you may notice, namespaces are declared but not used.

I wrote the following python script :

from lxlm import etree
with open(path, "r", encoding='utf8') as file_bib :
            data = file_bib.read().encode()
            dataxml = etree.XML(data)

#id
            recordId = dataxml.xpath(".//controlfield[@tag='001']/text()")[0]
            print (recordId)

When I launch my script, I have the followin error :

recordId = dataxml.xpath(".//controlfield[@tag='001']/text()")[0] IndexError: list index out of range

When I remove all the namespaces declaration from the tag, the parsing works Ok. I guess I could remove the namespaces programmaticaly,but I'd prefer to understand why my script doesn't work in the first place.

Thanks !

The default namespace is used: `xmlns="http://catalogue.bnf.fr/namespaces/InterXMarc"` — mzjn, Sep 09 '21 at 06:43
Ok, but how do I know it's the default namespace, and how do I declare it ?
recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0] doesn't work — Catalaburro, Sep 09 '21 at 06:54
A declaration of a default namespace in XML does not define a prefix, like the others do. But in the Python code, you still have to define a prefix:uri mapping and use it in the call to `xpath()`. But it seems you know that already. And please avoid code in comments. Edit the question instead. — mzjn, Sep 09 '21 at 06:57
Here is a similar question: https://stackoverflow.com/q/8053568/407651 — mzjn, Sep 09 '21 at 07:03
Yeah, and I must have made a typo or something because "recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0]" does work. Thanks for your help — Catalaburro, Sep 09 '21 at 07:17

score 0 · Answer 1 · answered Sep 09 '21 at 07:20

0

So thanks to the comments, here is the right way to deal with the case when there is a default namespace:

with open(path, "r", encoding='utf8') as file_bib :
            
            data = file_bib.read().encode()
            dataxml = etree.XML(data)
            print (dataxml)

#id
            recordId = dataxml.xpath(".//x:controlfield[@tag='001']/text()", namespaces={'x':'http://catalogue.bnf.fr/namespaces/InterXMarc'})[0]
            print (recordId)

answered Sep 09 '21 at 07:20

Catalaburro

23
5

Please provide additional details in your answer. As it's currently written, it's hard to understand your solution. – Community Sep 09 '21 at 07:22
Note that the other namespace declarations (with prefixes) are not "messing" with your parsing. They have no effect at all. – mzjn Sep 09 '21 at 07:30

Declared but non used namespaces messing with my parsing

1 Answers1