2

I am creating an XML document using REXML

File.open(xmlFilename,'w') do |xmlFile|
    xmlDoc = Document.new
    # add stuff to the document...
    xmlDoc.write(xmlFile,4)
end

Some of the elements contain quite a few arguments and hence, the according lines can get quite long. If they get longer than 166 chars, REXML inserts a line break. This is of course still perfectly valid XML, but my workflow includes some diffing and merging, which works best if each element is contained in one line.

So, is there a way to make REXML not insert these line-wrapping line breaks?

Edit: I ended up pushing the finished XML file through tidy as the last step of my script. If someone knew a nicer way to do this, I would still be grateful.

bastibe
  • 16,551
  • 28
  • 95
  • 126

2 Answers2

3

As Ryan Calhoun said in his previous answer, REXML uses 80 as its wrap line length. I'm pretty sure this is a bug (although I couldn't find a bug report just now). I was able to fix it by overwriting the Formatters::Pretty class's write_text method so that it uses the configurable @width attribute instead of the hard-coded 80.

require "rubygems"
require "rexml/document"
include REXML

long_xml = "<root><tag>As Ryan Calhoun said in his previous answer, REXML uses 80 as its wrap line length.  I'm pretty sure this is a bug (although I couldn't find a bug report just now).  I was able to *fix* it by overwriting the Formatters::Pretty class's write_text method.</tag></root>"

xml = Document.new(long_xml)

#fix bug in REXML::Formatters::Pretty
class MyPrecious < REXML::Formatters::Pretty
    def write_text( node, output )
        s = node.to_s()
        s.gsub!(/\s/,' ')
        s.squeeze!(" ")

        #The Pretty formatter code mistakenly used 80 instead of the @width variable
        #s = wrap(s, 80-@level)
        s = wrap(s, @width-@level)

        s = indent_text(s, @level, " ", true)
        output << (' '*@level + s)
    end
end

printer = MyPrecious.new(5)
printer.width = 1000
printer.compact = true
printer.write(xml, STDOUT)
Doug
  • 563
  • 4
  • 10
1

Short answer: yes and no.

REXML uses different formatters based on the value you specify for indent. If you leave the default -1, it uses REXML::Formatters::Default. If you give it a value like 4, it uses REXML::Formatters::Pretty. The pretty formatter does have logic in it to wrap lines (though it looks like it wraps at 80, not 166), when dealing with text (not tags or attributes). For example, the contents of

<p> a paragraph tag </p>

would be wrapped at 80 characters, but

<a-tag with='a' long='list' of='attributes'/>

would not be wrapped.

Anyway the 80 is hard-coded in rexml/formatters/pretty.rb and not configurable. And if you use the default formatter with no indent, then it's mostly just a raw dump without added line breaks. You could try the transitive formatter (see docs for Document.write), but it's broken in some version of ruby and might require a code hack. It probably isn't what you want anyway.


You might try taking a look at Builder::XmlMarkup from the builder gem.

Ryan Calhoun
  • 2,343
  • 22
  • 26