0

I'm trying to turn this:

<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

into this:

<note>
<to>
Tove
</to>
<from>
Jani
</from>
<heading>
Reminder
</heading>
<body>
Don't forget me this weekend!
</body>
</note>

using the python library lxml. I'm very new to it and would also appreciate any resources to learn from.

CSStudent
  • 425
  • 1
  • 6
  • 14
  • what would the point of this be? – Marc B Feb 07 '14 at 01:09
  • I'm using this output to diff large XML files that may not have consistent white space. – CSStudent Feb 07 '14 at 01:22
  • Then look at this: http://stackoverflow.com/questions/1871076/are-there-any-free-xml-diff-merge-tools-available. you shouldn't be comparing xml using string operations. – Marc B Feb 07 '14 at 01:23
  • This isn't my project. I'm just writing a script for my boss, and he wants them compared using diff for some reason. It seems simple enough but I can't figure out how to do it using lxml or elementTree. – CSStudent Feb 07 '14 at 01:31
  • Then tell your *boss* that trying to diff XML files this way is a terrible idea. – larsks Feb 07 '14 at 01:40

1 Answers1

0

While it's possible to tackle this with lxml, I think it ends up being needlessly complicated. The reason that it's complicated is that it makes no sense. So, let's use tools that don't know XML from bupkus.

Assuming that you have your data in a file called data.xml, this might work:

sed '
  s/</\n</g
  s/>/>\n/g
' data.xml | sed '/^ *$/ d'

This assumes GNU sed. The first sed command adds a newline before < or after >, and the second removes any blank lines.

I still think this is a terrible idea, but maybe this will work. Given your sample input above, this produces:

<note>
<to>
Tove
</to>
<from>
Jani
</from>
<heading>
Reminder
</heading>
<body>
Don't forget me this weekend!
</body>
</note>
larsks
  • 277,717
  • 41
  • 399
  • 399
  • This does work, but I can't use regular expressions to edit the xml. I know, it doesn't seem to make sense, but I'm really not looking for a lecture. Just looking for a way to do it purely using an xml parser. – CSStudent Feb 11 '14 at 21:46