0

I have a problem with some values not being declared as XML elements in my XML files. But for further processing, I need them to be an element. Example:

<A>
<B id="254">
    <C>Lore</C>
    <D>9</D> 
    12.34
</B>
<B id="255">
    <C>Ipsum</C>
    <D>125</D> 
    23.45
</B>
<E/>
<F id="256">
    <G>Lore Ipsum
        <E>79</E> 
        34.56
    </G>
</F>
</A>

In the end, the XML file should look similar to this:

<A>
<B id="254">
    <C>Lore</C>
    <D>9</D> 
    <Z>12.34</Z> 
</B>
<B id="255">
    <C>Ipsum</C>
    <D>125</D> 
    <Z>23.45</Z>
</B>
<E/>
<F id="256">
    <G>Lore Ipsum
        <E>79</E>
        <Y>34.56</Y> 
    </G>
</F>
</A>

I looked in various python documentation but only found a way to add a new element with a value.

mzjn
  • 48,958
  • 13
  • 128
  • 248
JME
  • 27
  • 4

2 Answers2

0

We you are looking for can be done - it's a little complicated, and you should use lxml instead of ElementTree, because of the former's better support of xpath. And, for good measure, f-strings are also necessary.

So altogether:

from lxml import etree
tt = """[your xml above]"""

doc = etree.XML(tt)
for t in doc.xpath('.//*'):
    #the concept of 'tail' below is a little tricky, too; you should read up on that, too
    if len(t.tail.strip())>0:
        #using a ternary operator (look it up):
        elem = "Z" if t.getparent().tag == "B" else "Y"
        #next, f-strings:
        t.getparent().append(etree.fromstring(f'<{elem}>{t.tail.strip()}</{elem}>'))
        #remove the original numerical text
        t.tail=""
#the next line requires python 3.9 or above
etree.indent(doc, space='  ')
print(etree.tostring(doc).decode())

The output should be your expected output.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
0

You can do this with the build in functionality xml.etree.ElementTree:

import xml.etree.ElementTree as ET

tree = ET.parse('Start_extended.xml')
root = tree.getroot()
# ET.dump(root)

def set_z(elem, val):
    # set value to the new tagz or y
    elem.text = val
    return elem

def catch_tail(root, b_elem):
    # read tail text
    tail_text = b_elem.tail
    return tail_text

def reset_tail(root, b_elem):
    # remove tail
    b_elem.tail = '\n'
    return b_elem.tail


for elem in root.iter():
    #print(elem.tag)
    if elem.tag == 'B':
        z = ET.SubElement(elem, 'Z')
    if elem.tag == 'G':
        y = ET.SubElement(elem, 'Y') 
    if elem.tag == "D":
        val = catch_tail(root, elem)
        reset_tail(root, elem)
    if elem.tag == "Z":
        set_z(elem, val)
    if elem.tag == "E":
        val1 = catch_tail(root, elem)
        reset_tail(root, elem)
    if elem.tag == "Y":
        set_z(elem, val1)
       
ET.dump(root)

tree.write("new.xml", encoding='utf-8', xml_declaration=True)

Output:

<?xml version="1.0" encoding="utf-8"?>
<A>
  <B id="254">
    <C>Lore</C>
    <D>9</D>
    <Z>12.34</Z>
  </B>
  <B id="255">
    <C>Ipsum</C>
    <D>125</D>
    <Z>23.45</Z>
  </B>
  <E />
  <F id="256">
    <G>Lore Ipsum 
    <E>79</E>
    <Y>34.56</Y>
    </G>
  </F>
</A>
Hermann12
  • 1,709
  • 2
  • 5
  • 14