I have some xml which has multiple elements with the same name, but each is in a different language, for example:
<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>
Normally, I'd retrieve an element using its attributes as follows:
titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap)
If I try and do this with [@xml:lang="FR"] (for example), I get the traceback error:
File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module>
titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap)
File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find
it = iterfind(elem, path, namespaces)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind
selector = _build_path_iterator(path, namespaces)
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator
selector.append(ops[token[0]](_next, token))
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate
token = next()
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer
raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map
I'm not surprised by this, but I'd like suggestions on how to get around the issue.
Thanks!
As requested, a cut-down but complete set of code (It works as expected if I remove the [bitsinsquarebrackets]):
import lxml
import codecs
file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name
print('Parsing ' + file_name)
#----- Sets up import and namespace
from lxml import etree
parser = lxml.etree.XMLParser()
tree = lxml.etree.parse(file_name, parser) # Name of file to test goes here
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2012',
'mpeg7': 'urn:tva:mpeg7:2008'}
#----- This code writes the output to a file
with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f: # Name the output file
f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n')
for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap) # Retreve the title
title = titlex.text if titlex != None else 'Missing' # If there isn't a title, print an alternative word
f.write(u'{}\n'.format(title)) # Write all the retrieved values to the same line with bar seperators and a new line