3

I've been googling for removing grandchildren from an xml file. However, I've found no perfect solution. Here's my case:

<tree>
    <category title="Item 1">item 1 text
        <subitem title="subitem1">subitem1 text</subitem>
        <subitem title="subitem2">subitem2 text</subitem>
    </category>

    <category title="Item 2">item 2 text
        <subitem title="subitem21">subitem21 text</subitem>
        <subitem title="subitem22">subitem22 text</subitem>
            <subsubitem title="subsubitem211">subsubitem211 text</subsubitem>
    </category>
</tree>

In some cases, I want to remove subitems. In other cases, I want to remove subsubitem. I know I can do like this in current given content:

import xml.etree.ElementTree as ET

root = ET.fromstring(given_content)
# case 1
for item in root.getiterator():
    for subitem in item:
        item.remove(subitem)

# case 2
for item in root.getiterator():
    for subitem in item:
        for subsubitem in subitem:
            subitem.remove(subsubitem)

I can write in this style only when I know the depth of the target node. If I only know the tag name of node I want to remove, how should I implement it? pseudo-code:

import xml.etree.ElementTree as ET

for item in root.getiterator():
    if item.tag == 'subsubitem' or item.tag == 'subitem':
        # remove item

If I do root.remove(item), it will certainly return an error because item is not a direct child of root.

Edited: I cannot install any 3rd-party-lib, so I have to solve this with xml.

ZDunker
  • 437
  • 1
  • 6
  • 18
  • Do I correctly understand the question text to mean that you *don't* actually care about whether something is grandchild or greatgrandchild, but only about its tag, in making the determination? – Charles Duffy Mar 13 '17 at 23:41
  • @CharlesDuffy Yes, you are right. Find the node with target name and remove it, regardless of the depth. – ZDunker Mar 13 '17 at 23:47

2 Answers2

3

I finally got this work for me only on xml lib by writing a recursive function.

def recursive_xml(root):
    if root.getchildren() is not None:
        for child in root.getchildren():
            if child.tag == 'subitem' or child.tag == 'subsubitem':
                root.remove(child)
            else:
                recursive_xml(child)

By doing so, the function will iterate every node in ET and remove my target nodes.

test_xml = r'''
<test>
    <test1>
        <test2>
            <test3>
            </test3>
            <subsubitem>
            </subsubitem>
        </test2>
        <subitem>
        </subitem>
        <nothing_matters>
        </nothing_matters>
    </test1>
</test>
'''
root = ET.fromstring(test_xml)
recursive_xml(root)

Hope this helps someone has restricted requirements like me....

ZDunker
  • 437
  • 1
  • 6
  • 18
1

To remove instances of subsubitem or subitem, no matter what their depth, consider the following example (with the caveat that it uses lxml.etree rather than upstream ElementTree):

import lxml.etree as etree

el = etree.fromstring('<root><item><subitem><subsubitem/></subitem></item></root>')
for child in el.xpath('.//subsubitem | .//subitem'):
  child.getparent().remove(child)
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Thank you Charles, this works for most cases. However, it requires installing `lxml`. I'm working on a large amount of machines so I have to stick on xml library... – ZDunker Mar 13 '17 at 23:56
  • Have you tried `el.iter('subitem')` and `el.iter('subsubitem')`? I don't make a habit of using upstream ElementTree, but would expect them to work. – Charles Duffy Mar 13 '17 at 23:59