0

I tried to read the following file:

with the code below:

require 'rexml/document'
include REXML

str = File.read("../pages/prac.xml").gsub(/\s+/, " ")

page = REXML::Document.new(str)
print "no elements\n" if page.root.has_elements?
print "Text: #{page.root.text}\n"
print "Name: #{page.root.name}\n"

page.root.each_element do |parent_tag|
    parent_tag.each_element do |tag|
        if tag.has_elements?
            tag.each_element do |data|
                p data
            end
        else
            puts "#{tag.name}: #{tag.text}"
        end
    end
end

The output I am seeing is:

no elements
Text:  
Name: html

Can someone help me by pointing out what is wrong here?

MBO
  • 30,379
  • 5
  • 50
  • 52
Karthick S
  • 3,204
  • 6
  • 36
  • 52

1 Answers1

1
print "no elements\n" if page.root.has_elements?

page.root.has_elements? returns true if the root element has child elements. In your case you are printing "no elements" when the root element finds child elements. It should probably read "has elements" instead as it is misleading as written.

Secondly, the output from page.root.name refers to the name of the root element of the XML document and hence prints out "html" in your case. However, page.root.text returns the first text node (not the text of a child element) which is probably a blank space and hence appears not to display anything.

rbnewb
  • 575
  • 5
  • 6
  • Thanks for the response rbnewb. There is no error in the xml (Checked online): One line I am not able to understand why there are no children for html in this case. – Karthick S Jun 02 '12 at 15:32
  • Using your example xml I was able to return the head tag text. I opened a new irb session and entered the following(each line separated by comma below) __require 'rexml/document'__, __include REXML__, __s = ' One line '__, __doc = Document.new(s)__, __head_tag = doc.root.get_elements('//head')__, __puts head_tag.first.text__ and I get a result of __One line__. You might also want to use __doc.root.children.each {|n| puts n}__ (where doc is a REXML document) to print out all children of the root node to get a better understanding of what REXML is seeing. – rbnewb Jun 03 '12 at 00:11
  • Thanks. Instead of using page.root.children.each I used page.root.elements.size(). That worked fine. – Karthick S Jun 03 '12 at 11:09