2

I have an input XML file:

<?xml version='1.0' encoding='utf-8'?>
<configuration>
  <runtime name="test" version="1.2" xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
    <ns0:assemblyBinding>
      <ns0:dependentAssembly />
    </ns0:assemblyBinding>
  </runtime>
</configuration>

...and Python script:

import xml.etree.ElementTree as ET

file_xml = 'test.xml'

tree = ET.parse(file_xml)
root = tree.getroot()
print (root.tag)
print (root.attrib)

element_runtime = root.find('.//runtime')
print (element_runtime.tag)
print (element_runtime.attrib)

tree.write(file_xml, xml_declaration=True, encoding='utf-8', method="xml")

...which gives the following output:

>test.py
configuration
{}
runtime
{'name': 'test', 'version': '1.2'}

...and has an undesirable side-effect of modifying XML into:

<?xml version='1.0' encoding='utf-8'?>
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
  <runtime name="test" version="1.2">
    <ns0:assemblyBinding>
      <ns0:dependentAssembly />
    </ns0:assemblyBinding>
  </runtime>
</configuration> 

My original script modifies XML so I do have to call tree.write and save edited file. But the problem is that ElementTree parser moves xmlns attribute from runtime element up to the root element configuration which is not desirable in my case.

I can't remove xmlns attribute from the root element (remove it from the dictionary of its attributes) as it is not listed in a list of its attributes (unlike the attributes listed for runtime element).

Why does xmlns attribute never gets listed within the list of attributes for any element?

How to force ElementTree to keep xmlns attribute within its original element?

I am using Python 3.5.1 on Windows.

Bojan Komazec
  • 9,216
  • 2
  • 41
  • 51
  • 1
    `etree` [pulls all namespaces into the first element](https://hg.python.org/cpython/file/v3.5.0/Lib/xml/etree/ElementTree.py#l771) as it internally doesn't track on which element the namespace was declared originally. If you don't want that, you'll have to write your own serialisation logic, or use lxml instead. But it shouldn't really make any difference where the namespace is declared. – mata Feb 23 '16 at 11:17
  • I am using Python to modify .NET app config file which must not contain namespace declarations in the root element (http://blogs.msdn.com/b/junfeng/archive/2008/03/24/app-config-s-root-element-should-be-namespace-less.aspx). – Bojan Komazec Feb 23 '16 at 11:23
  • What? WTF is mircrosoft using to parse xml??? I guess then your best choice will be to use [`lxml`](http://lxml.de/) instead of `xml.etree`, as it seems to respect the positioning of namsepace declarations. – mata Feb 23 '16 at 11:45
  • Yeah, that was also my first reaction...Installing lxml now. – Bojan Komazec Feb 23 '16 at 11:52
  • Yup, `lxml` preserves original location of `xmlns` attribute. – Bojan Komazec Feb 23 '16 at 12:06
  • @mata Please turn your comments into an answer so can accept it. – Bojan Komazec Feb 24 '16 at 10:59

2 Answers2

5

xml.etree.ElementTree pulls all namespaces into the first element as it internally doesn't track on which element the namespace was declared originally.

If you don't want that, you'll have to write your own serialisation logic.

The better alternative would be to use lxml instead of xml.etree, because it preserves the location where a namespace prefix is declared.

mata
  • 67,110
  • 10
  • 163
  • 162
3

Following @mata advice, here I give an answer with an example with code and xml file attached.

The xml input is as shown in the picture (original and modified) enter image description here

The python codes check the NtnlCcy Name and if it is "EUR", convert the Price to USD (by multiplying EURUSD: = 1.2) and change the NtnlCcy Name to "USD".

The python code is as follows:

from lxml import etree
pathToXMLfile = r"C:\Xiang\codes\Python\afmreports\test_original.xml"
tree = etree.parse(pathToXMLfile)
root = tree.getroot()
EURUSD = 1.2

for Rchild in root: 
    print ("Root child: ", Rchild.tag, ". \n")

    if Rchild.tag.endswith("Pyld"):
        for PyldChild in Rchild: 
            print ("Pyld Child: ", PyldChild.tag, ". \n")
        Doc = Rchild.find('{001.003}Document')
        FinInstrNodes = Doc.findall('{001.003}FinInstr')
    
        for FinInstrNode in FinInstrNodes:
            FinCcyNode = FinInstrNode.find('{001.003}NtnlCcy')
            FinPriceNode = FinInstrNode.find('{001.003}Price')
        
            FinCcyNodeText = ""
            if FinCcyNode is not None: 
                CcyNodeText = FinCcyNode.text
            if CcyNodeText == "EUR":
                PriceText = FinPriceNode.text
                Price = float(PriceText)
                FinPriceNode.text = str(Price * EURUSD) 
                FinCcyNode.text = "USD"

tree.write(r"C:\Xiang\codes\Python\afmreports\test_modified.xml", encoding="utf-8", xml_declaration=True) 
print("\n the program runs to the end! \n")  

As we compare the original and modified xml files, the namespace remains unchanged, the whole structure of the xml remains unchanged, only some NtnlCcy and Price Nodes have been changed, as desired.

The only minor difference we do not want is the first line. In the original xml file, it is <?xml version="1.0" encoding="UTF-8"?>, while in the modified xml file, it is <?xml version='1.0' encoding='UTF-8'?>. The quotation sign changes from double quotation to single quotation. But we think this minor difference should not matter.

The original file context will be attached for your easy test:

<?xml version="1.0" encoding="UTF-8"?>
<BizData xmlns="001.001">
<Hdr>
    <AppHdr xmlns="001.002">
        <Fr>
            <Id>XXX01</Id>
        </Fr>
        <To>
            <Id>XXX02</Id>
        </To>
        <CreDt>2019-10-25T15:38:30</CreDt>
    </AppHdr>
</Hdr>
<Pyld>
    <Document xmlns="001.003">
        <FinInstr>
            <Id>NLENX240</Id>
            <FullNm>AO.AAI</FullNm>
            <NtnlCcy>EUR</NtnlCcy>
            <Price>9</Price>
        </FinInstr>
        <FinInstr>
            <Id>NLENX681</Id>
            <FullNm>AO.ABN</FullNm>
            <NtnlCcy>USD</NtnlCcy>
            <Price>10</Price>
        </FinInstr>
        <FinInstr>
            <Id>NLENX320</Id>
            <FullNm>AO.ING</FullNm>
            <NtnlCcy>EUR</NtnlCcy>
            <Price>11</Price>
        </FinInstr>
    </Document>
</Pyld>
XYZ
  • 352
  • 5
  • 19