preserve namespaces with xml.etree

Question

When namespaces are defined in the root element of a source file, lxml reproduces all of them in the output. I need to do this with xml.etree. Even better would be to only output those that are used, but xml.etree does not find all of them.

One solution is to add the namespaces forcibly with root.set(). However, this duplicates any namespaces that xml.etree does find, as shown below.

Complete example suitable for pasting in a command prompt:

import xml.etree.ElementTree as ET
try:
    from io import StringIO
except ImportError:
    from StringIO import StringIO

def get_namespaces(sourcestring):
    sourcefile = StringIO(sourcestring)
    return dict(
        [node for _, node in ET.iterparse(sourcefile, events=['start-ns'])])

ET._namespace_map = dict()  # remove any previously registered namespaces
sourcetext = (
    '<desc xmlns="uri_a" xmlns:b="uri_b" xmlns:c="uri_c"'
    ' b:foo="c:bar">a</desc>')
source = ET.fromstring(sourcetext)
namespaces = get_namespaces(sourcetext)
for prefix, uri in namespaces.items():
    ET.register_namespace(prefix, uri)
    if prefix:
        tag = 'xmlns:' + prefix
    else:
        tag = 'xmlns'
    source.set(tag, uri)

print(ET.tostring(source, encoding='unicode'))

The result, which causes my application to fail:

<desc xmlns="uri_a" xmlns:b="uri_b" xmlns="uri_a" xmlns:b="uri_b" xmlns:c="uri_c" b:foo="c:bar">a</desc>

This is similar to Forcing xml.etree to output "unused" namespaces, but the namespaces come from a source file so they are not known by the Python code.

score 0 · Answer 1 · answered Jun 01 '20 at 19:38

First, generate the output without adding the missing namespaces. Get the namespaces found from that output. Then, generate the final output by adding the namespaces that were not found.

def add_namespaces_not_found(root):
    result_with_namespaces_found = ET.tostring(root, encoding='unicode')
    namespaces_found = get_namespaces(result_with_namespaces_found)
    for prefix, uri in namespaces.items():
        if prefix not in namespaces_found:
            if prefix:
                tag = 'xmlns:' + prefix
            else:
                tag = 'xmlns'
            root.set(tag, uri)

Result:

<desc xmlns="uri_a" xmlns:b="uri_b" xmlns:c="uri_c" b:foo="c:bar">a</desc>

Solutions that do not require generating the output twice are welcome.

preserve namespaces with xml.etree

1 Answers1