6

I want to transform an XML document which I have parsed with XmlSlurper. The (identical) XML tag names should be replaced with the value of the id attribute; all other attributes should be dropped. Starting from this code:

def xml = """<tag id="root">
            |  <tag id="foo" other="blah" more="meh">
            |    <tag id="bar" other="huh"/>
            |  </tag>
            |</tag>""".stripMargin()

def root = new XmlSlurper().parseText(xml)

// Some magic here.

println groovy.xml.XmlUtil.serialize(root)

I want to get the following:

<root>
  <foo>
    <bar/>
  </foo>
</root>

(I write test assertions on the XML, and want to simplify the structure for them.) I've read Updating XML with XmlSlurper and searched around, but found no way with replaceNode() or replaceBody() to exchange a node while keeping its children.

Ingo Karkat
  • 167,457
  • 16
  • 250
  • 324
  • I assume some tags also have content? If not, you can do: `root.breadthFirst().each { n -> n.replaceNode { "${n.@id}"( n.children() ) } }` But this will lose any content in the nodes – tim_yates Feb 18 '13 at 14:19
  • @tim_yates: Currently, there's no content, so your solution (though incomplete) would already work for me. Please post it as an answer! – Ingo Karkat Feb 18 '13 at 15:22
  • done :-) I'll have a think about a more generalised recursive function – tim_yates Feb 18 '13 at 15:25
  • Thank you very much for your answer; I'm anxious to see a general solution. – Ingo Karkat Feb 18 '13 at 15:41
  • Added what seems -- after a quick test -- a more general solution – tim_yates Feb 18 '13 at 15:48
  • Great. The biggest drawback I see is that one ends up with a String, not a `GPathResult`, so if there's further processing (and not just printing like in this toy example), it would have to be re-parsed again. – Ingo Karkat Feb 18 '13 at 16:06
  • True, but I don't believe it's possible using `XmlSlurper`'s data structure, and `XmlParser` doesn't let you replace the root node – tim_yates Feb 18 '13 at 16:10

1 Answers1

5

Adding the 'magic' in to the code in the question gives:

def xml = """<tag id="root">
            |  <tag id="foo" other="blah" more="meh">
            |    <tag id="bar" other="huh"/>
            |  </tag>
            |</tag>""".stripMargin()

def root = new XmlSlurper().parseText(xml)

root.breadthFirst().each { n ->
  n.replaceNode { 
    "${n.@id}"( n.children() )
  }
}

println groovy.xml.XmlUtil.serialize(root)

Which prints:

<?xml version="1.0" encoding="UTF-8"?><root>
  <foo>
    <bar/>
  </foo>
</root>

HOWEVER, this will drop any content in the nodes. To maintain content, we would probably need to use recursion and XmlParser to generate a new doc from the existing one... I'll have a think

More general solution

I think this is more generalised:

import groovy.xml.*

def xml = """<tag id="root">
            |  <tag id="foo" other="blah" more="meh">
            |    <tag id="bar" other="huh">
            |      something
            |    </tag>
            |    <tag id="bar" other="huh">
            |      something else
            |    </tag>
            |    <noid>woo</noid>
            |  </tag>
            |</tag>""".stripMargin()

def root = new XmlParser().parseText( xml )

def munge( builder, node ) {
  if( node instanceof Node && node.children() ) {
    builder."${node.@id ?: node.name()}" {
      node.children().each {
        munge( builder, it )
      }
    }
  }
  else {
    if( node instanceof Node ) {
      "${node.@id ?: node.name()}"()
    }
    else {
      builder.mkp.yield node
    }
  }
}

def w = new StringWriter()
def builder = new MarkupBuilder( w )
munge( builder, root )

println XmlUtil.serialize( w.toString() )

And prints:

<?xml version="1.0" encoding="UTF-8"?><root>
  <foo>
    <bar>something</bar>
    <bar>something else</bar>
    <noid>woo</noid>
  </foo>
</root>

Now passes through nodes with no (or empty) id attributes

tim_yates
  • 167,322
  • 27
  • 342
  • 338