1

I am new to xml and was trying to copy a node. Though it copies the node, when i append it, the closing tags mismatch. Here is the xml that i am parsing.

<doc>
    <branch name="release01" hash="f200013e">
        <sub-branch name="subrelease01">
            xml,sgml
        </sub-branch>
    </branch>
</doc>

Here is the code that i am using to parse the xml:

import lxml.etree as ET
import copy

tree = ET.ElementTree(file="doc2.xml")
root = tree.getroot()

lst_nodes = tree.findall("branch")
ele = 0

while ele < len(lst_nodes):
    ref = lst_nodes[ele]
    if (lst_nodes[ele].attrib.get("name") == "release01"):
        count = 0
        while count < 1:
            copied = copy.deepcopy(ref)
            ref.append(copied)
            count=count+1
    ele+=1

ET.dump(root)

The output observed is:

<doc>
    <branch name="release01" hash="f200013e">
        <sub-branch name="subrelease01">
            xml,sgml
        </sub-branch>
    <branch name="release01" hash="f200013e">
        <sub-branch name="subrelease01">
            xml,sgml
        </sub-branch>
    </branch>
</branch>
</doc>

As you can see end tag of "branch" is mismatched. Can someone help me to identify the mistake that i am doing while copying or appending the node?

Manu
  • 103
  • 6

1 Answers1

2

I believe that the output you are trying to produce is:

<doc>
    <branch name="release01" hash="f200013e">
        <sub-branch name="subrelease01">
            xml,sgml
        </sub-branch>
    </branch>
    <branch name="release01" hash="f200013e">
        <sub-branch name="subrelease01">
            xml,sgml
        </sub-branch>
    </branch>
</doc>

The mistake you are making is that you are appending your copy of ref to the elements which make up ref itself, and not placing your copy after ref (i.e. you want the copy to be a sibling of ref and not a child). To achieve the desired behaviour you need to append your copy of ref to the parent element of ref this can be achieved by using the getparent() method and then using append() or even more conveniently you can directly use Element's addnext() method.

i.e. replace ref.append(copy) with ref.addnext(copy)

addnext API Reference

Calimocho
  • 368
  • 1
  • 17
  • Thanks for the answer. It worked. But after the end tag of first branch, the new sibling starts in the same line instead of new line. Pretty_print didn't solve the problem. Any solution. – Manu Jun 17 '20 at 11:40
  • That's strange, dump seems to be for debugging purposes only according to the API reference. Does the same problem occur when printing the result of the tostring() method? (with pretty_print=True) and when writing tostring() to file? – Calimocho Jun 17 '20 at 11:46
  • Yes. I tried writing to a file using tostring() method. The result was same. No need new line was added at the start of new sibling. – Manu Jun 17 '20 at 11:54
  • Based on this answer https://stackoverflow.com/a/9612463/5910149, instantiate your tree object using a parser that removes blank spaces. parser = ET.XMLParser(remove_blank_text=True) tree = ET.ElementTree(file="doc2.xml", parser=parser) – Calimocho Jun 17 '20 at 12:02
  • An explanation for this behaviour is given in the lxml doucmentation: https://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output – Calimocho Jun 17 '20 at 12:06