0

Let's have this in an XML file (in.xml):

<?xml version="1.0" encoding="ASCII"?>
<a>
  <b>
    <c>abc</c>
  </b>
</a>

Now, I run this code to strip the tag <b> and write the result back in a file:

from lxml import etree

tree = etree.parse('in.xml', parser=etree.XMLParser())
root = tree.getroot()
etree.strip_tags(root, 'b')
tree.write('out.xml', xml_declaration=True)

This output file (out.xml) looks like this:

<?xml version='1.0' encoding='ASCII'?>
<a>
  
    <c>abc</c>
  
</a>

How can I remove the blank lines left by the stripped tag?

mrgou
  • 1,576
  • 2
  • 21
  • 45
  • Correct me if I'm wrong, but when an xml is parsed is not treated as a text but every leaf as objects theirselves. When you do strip_tags, you delete the object itself, but not the descendants or tails. In fact, it will merge the descendants to the parent of the stripped tag. – UnsignedFoo Oct 07 '22 at 15:54
  • 3
    This is it https://stackoverflow.com/a/9612463/2834978 – LMC Oct 07 '22 at 15:55
  • You're right, `tree = etree.parse('in.xml', parser=etree.XMLParser(remove_blank_text=True)) ` works. I thought this would be treated at the time of writing and that any setting in the `parse` method would only apply when reading the file. Thanks! – mrgou Oct 08 '22 at 10:15
  • @UnsignedFoo, you are correct. – mrgou Oct 08 '22 at 10:16

0 Answers0