2

I have some xml which has multiple elements with the same name, but each is in a different language, for example:

<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>

Normally, I'd retrieve an element using its attributes as follows:

titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap)

If I try and do this with [@xml:lang="FR"] (for example), I get the traceback error:

  File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module>
    titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap) 

  File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435)

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find
    it = iterfind(elem, path, namespaces)

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind
    selector = _build_path_iterator(path, namespaces)

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator
    selector.append(ops[token[0]](_next, token))

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate
    token = next()

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer
    raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map

I'm not surprised by this, but I'd like suggestions on how to get around the issue.

Thanks!

As requested, a cut-down but complete set of code (It works as expected if I remove the [bitsinsquarebrackets]):

import lxml
import codecs

file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name
print('Parsing ' + file_name)


#----- Sets up import and namespace

from lxml import etree

parser = lxml.etree.XMLParser()


tree = lxml.etree.parse(file_name, parser)                                 # Name of file to test goes here
root = tree.getroot()

nsmap = {'xmlns': 'urn:tva:metadata:2012',
         'mpeg7': 'urn:tva:mpeg7:2008'}

#----- This code writes the output to a file

with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f:                        # Name the output file
    f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n')
    for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
       titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap)             # Retreve the title
       title = titlex.text if titlex != None else 'Missing'             # If there isn't a title, print an alternative word
       f.write(u'{}\n'.format(title))                     # Write all the retrieved values to the same line with bar seperators and a new line
mzjn
  • 48,958
  • 13
  • 128
  • 248
Nick
  • 141
  • 11
  • Can you show the exact code that lead to this error? seems like the `nsmap` namespace you are defining does not define `xml` in it. – Anand S Kumar Jul 06 '15 at 16:20
  • Code updated above. The xml files don't seem to use xml:lang as a namespace (but xml:lang="PL" does appear in the top level in ) – Nick Jul 06 '15 at 16:42

2 Answers2

3

The xml prefix in xml:lang does not need to be declared in an XML document, but if you want to use xml:lang in XPath lookups, you have to define a prefix mapping in the Python code.

The xml prefix is reserved (as opposed to "normal" namespace prefixes which are arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace. See the Namespaces in XML 1.0 W3C recommendation.

Example:

from lxml import etree

# Required mapping
nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"}
 
XML = """
<root>
  <Title xml:lang="FR" type="main">Les Tudors</Title>
  <Title xml:lang="DE" type="main">Die Tudors</Title>
  <Title xml:lang="IT" type="main">The Tudors</Title>
</root>"""
 
doc = etree.fromstring(XML)
 
title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap)
print(title_FR.text)

Output:

Les Tudors

If there is no mapping for the xml prefix, you get the "prefix 'xml' not found in prefix map" error. If the URI mapped to the xml prefix is not http://www.w3.org/XML/1998/namespace, the find method in the code snippet above does not return anything.

mzjn
  • 48,958
  • 13
  • 128
  • 248
  • Was this answer helpful? – mzjn Aug 05 '15 at 16:36
  • 1
    Yes, this answer was extremely helpful. Can you provide any insight into why, if the `xml` namespace is reserved and does not need to be declared, ElementTree does not automagically map it? Is there any case in which mapping the xml namespace by default would be a problem? – Dakota Jul 02 '16 at 08:33
  • @Dakota: Good questions. I don't have any answers (for now at least). – mzjn Jul 02 '16 at 19:50
0

If you have control over the xml file , you should change the xml:lang attribute to lang .

Or if you do not have that control , I would suggest adding xml in the nsmap, like -

nsmap = {'xmlns': 'urn:tva:metadata:2012',
         'mpeg7': 'urn:tva:mpeg7:2008',
         'xml': '<namespace>'}
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176