0

I'm using Ruby 1.9.3 and REXML to parse an XML document, make a few changes (additions/subtractions), then re-output the file. Within this file is a block that looks like this:

<someElement>
  some.namespace.something1=somevalue1
  some.namespace.something2=somevalue2
  some.namespace.something3=somevalue3
</someElement>

The problem is that after re-writing the file, this block always ends up looking like this:

<someElement>
  some.namespace.something1=somevalue1
  some.namespace.something2=somevalue2 some.namespace.something3=somevalue3
</someElement>

The newline after the second value (but never the first!) has been lost and turned into a space. Later, some other code which I have no control or influence over will be reading this file and depending on those newlines to properly parse the content. Generally in this situation i'd use a CDATA to preserve the whitespace, but this isn't an option as the code that parses this data later is not expecting one - it's essential that the inner text of this element is preserved exactly as-is.

My read/write code looks like this:

xmlFile = File.open(myFile)
contents = xmlFile.read
xmlDoc = REXML::Document.new(contents, { :respect_whitespace => :all })
xmlFile.close

{perform some tasks}

out = ""
xmlDoc.write(out, 2)
File.open(filePath, "w"){|file| file.puts(out)}

I'm looking for a way to preserve the whitespace of text between elements when reading/writing a file in this manner using REXML. I've read a number of other questions here on stackoverflow on this subject, but none that quite replicate this scenario. Any ideas or suggestions are welcome.

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
Fopedush
  • 2,036
  • 4
  • 20
  • 22

1 Answers1

1

I get correct behavior by removing the indent (second) parameter to Document.write():

#xmlDoc.write(out, 2)
xmlDoc.write(out)

That seems like a bug in Document.write() according to my reading of the docs, but if you don't really need to set the indentation, then leaving that off should solve yor problem.

Darshan Rivka Whittle
  • 32,989
  • 7
  • 91
  • 109
  • I'll give this a shot and report back. Retaining indention would certainly be a plus, though, as I'd like the file to remain easily human-readable if possible. – Fopedush Apr 10 '13 at 21:12
  • It gives you good, readable indentation by default; that option is supposed to override the default, but doesn't seem to work. – Darshan Rivka Whittle Apr 10 '13 at 21:13
  • It appears that removing that parameter completely solved the problem. I wish I could remember why I put it in there in the first place. Thanks for your prompt reply. – Fopedush Apr 10 '13 at 21:21
  • You're very welcome. To clarify my previous statement: it *preserves* the incoming indentation by default, so as long as that is readable and you aren't doing anything too drastic during processing, the output will be readable as well. – Darshan Rivka Whittle Apr 10 '13 at 21:25