How to get all leaf cell from an REXML element, and save them into a array?

Question

Have a Ruby REXML element like below:

<a_1>
  <Tests>
    <test enabled='1'>trans </test>
    <test enabled='1'>ac </test>
    <test enabled='1'>dc </test>
  </Tests>
  <Corners>
    <corner enabled='0'>default</corner>
    <corner enabled='1'>C0 </corner>
  </Corners>
</a_1>

I want to find all leaf elements, so the result should be:

<test enabled='1'>trans </test>
<test enabled='1'>ac </test>
<test enabled='1'>dc </test>
<corner enabled='0'>default</corner>
<corner enabled='1'>C0 </corner>

My code is:

require 'rexml/document' 
include  REXML

def getAllLeaf(xmlElement)
  if xmlElement.has_elements?
    xmlElement.elements.each {|e| 
      getAllLeaf(e)
    }
  else
    return xmlElement
  end
end

It works fine and did show the right outputs on screen. However, I found I had a problem when I try to save the result to an Array, for this recursive procedure. So I wounder if there is a way to save this output to one array which can be used afterwards?

I struggled out a recursive way to do it, though a little odd, I would like to share it out:

def getAllLeaf(eTop,aTemp=Element.new("LeafElements"))
  if eTop.has_elements?
    eTop.elements.each {|e| 
      getAllLeaf(e,aTemp)
    }
  else
    aTemp<< eTop.dup
  end
  return aTemp
end

7stud · Accepted Answer · 2014-05-25T19:30:03.813

It works fine and did show the right outputs on screen.

In fact, the code shows no outputs--anywhere. In any case, your recursive function doesn't work, which you can see if you call your method on the element <Tests> when <Tests> looks like this:

  <Tests>
    <test enabled='1'>
      <HELLO>world</HELLO>
    </test>
    <test enabled='1'>ac </test>
    <test enabled='1'>dc </test>
  </Tests>

Your recursive method doesn't work because when you write:

xmlElement.elements.each {|e|

the each() method returns the thing on it's left, i.e. xmlElement.elements. Given your xml, your recursive method is equivalent to:

def getAllLeaf(xmlElement)
    xmlElement.elements.each {|e| 
      "blah"  #your code here has no effect on what each() returns.
    }
end

..which is equivalent to:

def getAllLeaf(xmlElement)
    return xmlElement.elements
end

Do you want to stick with recursion? It's much simpler to search all the elements for the elements with no children:

require "rexml/document"
include REXML

xml = <<'END_OF_XML'
<a_1>
  <Tests>
    <test enabled='1'>trans </test>
    <test enabled='1'>ac </test>
    <test enabled='1'>dc </test>
  </Tests>
  <Corners>
    <corner enabled='0'>default</corner>
    <corner enabled='1'>C0 </corner>
  </Corners>
</a_1>
END_OF_XML

doc = Document.new xml
root = doc.root

XPath.each(root, "//*") do |element|
  if not element.has_elements?
    enabled = element.attributes['enabled'] 
    text = element.text
    puts "#{enabled} ... #{text}"
  end
end

--output:--
1 ... trans 
1 ... ac 
1 ... dc 
0 ... default
1 ... C0

Or, if all the leaf elements are the only elements with the attribute "enabled", you should do this:

XPath.each(root, "//*[@enabled]") do |element|
  enabled = element.attributes['enabled'] 
  text = element.text
  puts "#{enabled} ... #{text}"
end

There's even a cryptic xpath that will directly select elements without element children:

XPath.each(root, "//*[not(*)]") do |element|
  enabled = element.attributes['enabled'] 
  text = element.text
  puts "#{enabled} ... #{text}"
end

Also, have you considered using the nokogiri gem? It's pretty much ruby's standard XML/HTML parser.

Thanks 7stud, your solution is quit good and works perfect in my side. — user3672656, May 25 '14 at 20:40
Thanks 7stud, for your explanation and solution.The solution is good and works perfect in my side. Sorry I am new in Ruby, and just begin to use REXML and not thought about XPath. That one looks really powerful and I think I should learn it a little bit more. — user3672656, May 25 '14 at 20:47
I don't know how much you know about xml parsing, but all text is enclosed in a text node. That rule applies to newlines too. For instance, in your original xml immediately after the tag there is a newline. Unfortunately, when you step through all the nodes in a document, text nodes are different than Elements, and text nodes cause errors like "\n" has no method named has_elements? Just execute `p root.to_a` to see all the direct children of root to see what I'm talking about. The nice thing about Xpath is that it fetches only the named tags, leaving out the newline nodes. — 7stud, May 25 '14 at 22:59
By the way, you can make REXML skip newline text nodes without having to use XPath if you create the doc like this: `doc = Document.new(xml, :ignore_whitespace_nodes=>:all)` — 7stud, May 25 '14 at 22:59
Hi 7stud, thank you very much for your detail explanation. Yes, I got the new line error, then I used a check to ignore it(odd again). I need more hard work on Ruby and REXML......Thanks again — user3672656, May 26 '14 at 21:30

How to get all leaf cell from an REXML element, and save them into a array?

1 Answers1