2

I have managed to figure out the primary part of my question, "how do I insert one XML document into another?" The result I get will work but the printed XML is missing a linefeed.

s = <<EOF
<application>
  <email>
    <host>mail.test.com</host>
    <port>25</port>
  </email>
</application>
EOF

p = <<EOF
<auth>
  <user>godber</user>
  <pass>foo</pass>
</auth>
EOF

subdoc = REXML::Document.new(p)
doc = REXML::Document.new(s)
doc.root.insert_after( '//email', subdoc.root )
doc.write

this outputs the following, which you can see has the auth tag starting immediately after the email close tag without a newline

<application>
  <email>
    <host>mail.test.com</host>
    <port>25</port>
  </email><auth>
  <user>godber</user>
  <pass>foo</pass>
</auth>
</application>

Actually, just as I have finished this I realized that I can change my last line to

doc.write( $stdout, 2 )

This was clearly written in the rexml tutorial, I had just overlooked it assuming that something else was wrong. I guess I will submit this just in case anyone else is puzzled by this. If anyone has tips along these lines, I'd be happy to hear them.

godber
  • 481
  • 1
  • 4
  • 6

1 Answers1

2

REXML is doing exaclty what you are asking: doc.root.insert_after('//email', subdoc.root) means put subdoc.root just after the last email element. And the last email element ends exactly after the > in </email>.

Whitespace nodes, although often overlooked by us human reader, cannot be ignored by XML parsers. The key thing here is this XML document

<doc>
  <email>
  </email>
</doc>

is not composed by an email element inside a doc element. In fact it is made of, in order,

  • root doc element,
  • one text node with the text "[newline][space][space]",
  • an email element that contains a text node "[newline][space][space]",
  • another text node with the text "[newline]".

This means that REXML cannot arbitrarily add the spaces needed to indent auth in the way we expect it.

A way to work around this is to ask REXML to force a re-serialization of the XML document, this is what you have done using the #write method with a positive indentation level. But you can do that only if whitespaces are not important in your document: would you let REXML reformat a snippet of carefully indented Ruby code?

gioele
  • 9,748
  • 5
  • 55
  • 80