I'm trying to get all the nodes from a HTML document using Nokogiri.
I have this HTML:
<html>
<body>
<h1>Header1</h1>
<h2>Header22</h2>
<ul>
<li>Li1</li>
<ul>
<li>Li1</li>
<li>Li2</li>
</ul>
</ul>
</body>
</html>
String version:
string_page = "<html><body><h1>Header1</h1><h2>Header22</h2><ul><li>Li1</li><ul><li>Li1</li><li>Li2</li></ul></ul></body></html>"
I created an object:
page = Nokogiri.HTML(string_page)
And I was trying to traverse it:
result = []
page.traverse { |node| result << node.name unless node.name == "text" }
=> ["html", "h1", "h2", "li", "li", "li", "ul", "ul", "body", "html", "document"]
But what I don't like is the order of elements. I need to have an array with same order as they appear:
["html", "body", "h1", "h2", "ul", "li", "ul", "li", "li" ]
I don't need closing tags.
Does anybody have a better solution to accomplish this?