2

I want to drill down the tree, and store all the levels:

search_q = Regex.new("Some search regex here")
#something like: page.search('body').first.children.select {|x| x.text[search_q]}.first.children.select {|x| x.text[search_q]}.first......ad infinitum.

I've done a hack:

arbitrarily_long_number = 100
drill = []
(0..arbitrarily_long_number).collect do |n|
  begin
    drill << eval("page.search('body')"+".first.children.select {|x| x.text[search_q]}" * n)
  rescue
    break
  end
end

The problem is that this drills only through the "first" selection. Is there a way to make it drill through every node? I'm thinking of some sort of inject function, but I still haven't wrapped my head around it. Any help would be appreciated.

Output:

pp drill[-4]
puts
pp drill[-3]
puts
pp drill[-2]
#=>[#(Element:0x3fc2324522b4 {
   name = "u",
   children = [
     #(Element:0x3fc232060b60 {
       name = "span",
       attributes = [
         #(Attr:0x3fc2320603e0 {
           name = "style",
           value = "font-size: large;"
           })],
       children = [ #(Text "Ingredients:")]
       })]
   })]

[#(Element:0x3fc232060b60 {
   name = "span",
   attributes = [
     #(Attr:0x3fc2320603e0 { name = "style", value = "font-size: large;" })],
   children = [ #(Text "Ingredients:")]
   })]

[#(Text "Ingredients:")]

Notes: I'm using the mechanize gem, which leverages off of Nokogiri. http://mechanize.rubyforge.org/Mechanize/Page.html#method-i-search http://nokogiri.org/Nokogiri/XML/Node.html#method-i-search

Phrogz
  • 296,393
  • 112
  • 651
  • 745
Mr. Demetrius Michael
  • 2,326
  • 5
  • 28
  • 40

2 Answers2

2

To me it sounds like you want traverse:

doc.traverse do |node|
  drill << node
end
pguardiario
  • 53,827
  • 19
  • 119
  • 159
1

Your question is not clear.

If, by

I want to drill down the tree, and store all the levels:

you mean you want to traverse all the nodes, tell Nokogiri to do that.

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<a>
  <b>
    <c>1</c>
  </b>
</a>
EOT

doc.search('*').each do |n|
  puts n.name
end

Pasting that into IRB and grabbing the output:

irb(main):011:0* doc.search('*').each do |n|
irb(main):012:1*   puts n.name
irb(main):013:1> end
a
b
c

I used XML, and you're using HTML, but that won't matter. You'll have to change doc to page to fit Mechanize's way, but that is easy.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303