2

I have an XML file generated by an IDE; however, it unfortunately outputs code with newlines as BRs and seems to randomly decide where to place newlines. Example:

if test = true
    foo;
    bar;
endif

becomes the following XTML within an XML file:

<body>
    <p>if test = true<br />    foo;<br />    bar;<br />endif
    </p>
</body>

I am trying to make a pre-processor for these files in python using lxml to make it easier to version control them. However, I cannot figure out to modify the XML as text so that I can place each BR on it's own line like the following:

<body>
<p>if test = true
    <br />    foo;
    <br />    bar;
    <br />endif
</p>
</body>

How does one edit the xml as text, or failing that, is there another way to get the results like above?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
user1601333
  • 151
  • 1
  • 10

1 Answers1

0

One option would be to add a new-line character to the p tag's text and br tag tails. Example:

from lxml import html

data = """
<html>
<body>
<p>if test = true<br />    foo;<br />    bar;<br />endif
</p>
</body>
</html>
"""

tree = html.fromstring(data)

p = tree.find('.//p')
p.text += '\n'

for element in tree.xpath('.//br'):
    element.tail += '\n'

print html.tostring(tree)

Prints:

<html>
<body>
<p>if test = true
<br>    foo;
<br>    bar;
<br>endif

</p>
</body>
</html>
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thank you. This worked perfectly. I did need to add some testing to make sure that I hadn't already added a \n at the end in case I had run the script on the same file twice, but other than that this was exactly what I was looking for. – user1601333 Oct 03 '14 at 22:09