2

I am using lxml to modify an xml. In the following code, I want to delete the child of all "heading" element and assign the child text to its parent.

originally

<heading><whateverchildvalue>TEXTIWANT</whateverchildvalue></heading> 

to

<heading>TEXTIWANT</heading>) 

I tried to use loop for this, however somehow when I call node.remove(attr_children[0]), it jumps out of the loop and proceed to the next call of "ET.tostring(parsed)" (?) and did not modify the second "heading". To understand this, remove the "node.remove(attr_children[0])" and re-run the following code and compare the previous version of what was printed. What am I doing wrong here so that it can do a proper loop, and assign the child text to "heading" parent for all "heading" element in the xml string ?

xml_string="""
<note>
<to>Tove</to>
<mybigheader>
    <heading><deleteme>Jani</deleteme></heading>
    <heading><wantkey>Reminder</wantkey></heading>
</mybigheader>
<body>Don't forget me this weekend!</body>
</note>
"""


def modif_xml(xml_string):

    parsed = ET.fromstring(xml_string)
    for node in parsed.iter():
        print "node is ", node
        if "heading" in node.tag:
             attr_children =  node.getchildren()
             for i in attr_children:
                 child_tag = i.tag
                 child_value = i.text
             node.remove(attr_children[0])
             node.text = child_value

    my_xml = ET.tostring(parsed)
    root = ET.XML(my_xml)
    print ET.tostring(root, pretty_print=True)


modif_xml(xml_string)
sateayam
  • 1,089
  • 3
  • 16
  • 38

1 Answers1

1

Consider XSLT, the special-purpose declarative language used to transform XML files. Python's lxml module can run XSLT 1.0 scripts. While this may seem overkill, you avoid any for looping and if logic. Also, XSLT scripts are XML files and can be handled like any XML: parsed from string or file.

Specially below you run the Identity Transform (to copy document as is), then re-write the template of heading's children by calling apply-templates with no xsl:copy (that avoids the current node):

import lxml.etree as et

xml_string="""
<note>
<to>Tove</to>
<mybigheader>
    <heading><deleteme>Jani</deleteme></heading>
    <heading><wantkey>Reminder</wantkey></heading>
</mybigheader>
<body>Don't forget me this weekend!</body>
</note>
"""    
dom = et.fromstring(xml_string)

xsl_string='''
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>  

  <xsl:template match="heading/*">
     <xsl:apply-templates />
  </xsl:template>

</xsl:transform>
'''    
xslt = et.fromstring(xsl_string)

transform = et.XSLT(xslt)
newdom = transform(dom)

print(newdom)

# <?xml version="1.0"?>
# <note>
#   <to>Tove</to>
#   <mybigheader>
#     <heading>Jani</heading>
#     <heading>Reminder</heading>
#   </mybigheader>
#   <body>Don't forget me this weekend!</body>
# </note>
Parfait
  • 104,375
  • 17
  • 94
  • 125