When namespaces are defined in the root element of a source file, lxml
reproduces all of them in the output. I need to do this with xml.etree
. Even better would be to only output those that are used, but xml.etree
does not find all of them.
One solution is to add the namespaces forcibly with root.set()
. However, this duplicates any namespaces that xml.etree
does find, as shown below.
Complete example suitable for pasting in a command prompt:
import xml.etree.ElementTree as ET
try:
from io import StringIO
except ImportError:
from StringIO import StringIO
def get_namespaces(sourcestring):
sourcefile = StringIO(sourcestring)
return dict(
[node for _, node in ET.iterparse(sourcefile, events=['start-ns'])])
ET._namespace_map = dict() # remove any previously registered namespaces
sourcetext = (
'<desc xmlns="uri_a" xmlns:b="uri_b" xmlns:c="uri_c"'
' b:foo="c:bar">a</desc>')
source = ET.fromstring(sourcetext)
namespaces = get_namespaces(sourcetext)
for prefix, uri in namespaces.items():
ET.register_namespace(prefix, uri)
if prefix:
tag = 'xmlns:' + prefix
else:
tag = 'xmlns'
source.set(tag, uri)
print(ET.tostring(source, encoding='unicode'))
The result, which causes my application to fail:
<desc xmlns="uri_a" xmlns:b="uri_b" xmlns="uri_a" xmlns:b="uri_b" xmlns:c="uri_c" b:foo="c:bar">a</desc>
This is similar to Forcing xml.etree to output "unused" namespaces, but the namespaces come from a source file so they are not known by the Python code.