0

I have to work with some XML file which seems weirdly sparse compared to what I saw in tutorials.

    <A>
       <B>
          <C>text<C>
          <D>text2<D>
          <E a=v1 b=v2>
             <F>v3</F>
          </E>
       </B>
   </A>

How to modify the value of v1, v2 and v3?

jrd1
  • 10,358
  • 4
  • 34
  • 51
Benji_90210
  • 25
  • 1
  • 7
  • It depends on the library and your use of it. What have you tried so far? Also, I'm asserting that "weirdly parse" is a typo and should actually be "weirdly sparse", yes? – jrd1 Apr 20 '23 at 17:44
  • Why not to use XSLT for the task? – Yitzhak Khabinsky Apr 20 '23 at 19:50
  • jrd1 yes typo. I was planning to use ElementTree as it seems to be the most use library in my team after that, whatever does the job. I'll take a look @Yitzhak Khabinsky's solution tomorrow (XLST). – Benji_90210 Apr 20 '23 at 22:27

1 Answers1

3

Here is how to do it by using XSLT.

It is using a so called Identity Transform pattern.

I had to fix the XML sample to make it well-formed:

  • Opening and closing XML elements were not matching.
  • Attribute values were not enclosed by quotes.

Input XML

<A>
    <B>
        <C>text</C>
        <D>text2</D>
        <E a="v1" b="v2">
            <F>v3</F>
        </E>
    </B>
</A>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" encoding="utf-8"
                omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="v1" select="'v1_new'"/>
    <xsl:param name="v2" select="'v2_new'"/>
    <xsl:param name="v3" select="'v3_new'"/>

    <!--Identity Transform pattern-->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="E/@a">
        <xsl:attribute name="a">
            <xsl:value-of select="$v1"/>
        </xsl:attribute>
    </xsl:template>

    <xsl:template match="E/@b">
        <xsl:attribute name="b">
            <xsl:value-of select="$v2"/>
        </xsl:attribute>
    </xsl:template>

    <xsl:template match="F/text()">
        <xsl:value-of select="$v3"/>
    </xsl:template>
</xsl:stylesheet>

Output XML

<A>
  <B>
    <C>text</C>
    <D>text2</D>
    <E a="v1_new" b="v2_new">
      <F>v3_new</F>
    </E>
  </B>
</A>

Python

import os
import lxml.etree as ET

inputfile = "D:\\temp\\input.xml"
xsltfile = "D:\\temp\\process.xslt"
outfile = "D:\\output\\output.xml"

dom = ET.parse(inputfile)
xslt = ET.parse(xsltfile)
transform = ET.XSLT(xslt)
newdom = transform(dom,
              v1=XSLT.strparam("bk101"),
              v2=XSLT.strparam("New Author"))
infile = unicode((ET.tostring(newdom, pretty_print=True)))
outfile = open(outfile, 'a')
outfile.write(infile)
Parfait
  • 104,375
  • 17
  • 94
  • 125
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • 1
    Lxml has the [write_output](https://lxml.de/api/lxml.etree._XSLTResultTree-class.html#write_output) method from XSLT result tree: `newdom.write_output(outfile)` (to replace last three lines). – Parfait Apr 23 '23 at 03:01