0

Is there way to output a Nokogiri::HTML::Document object in a pretty format (and not as HTML)? I want to be able to see the object where its indented as levels go deeper. Like using awesome_print (tried it - doesn't work). Thanks!

Currently in the console when I run the following command to instantiate the Nokogiri object via:

irb(main):105:0> html = open("http://www.google.com")
=> #<Tempfile:/var/folders/kx/nwfjzgd153g071ykz0mtgd0r0000gp/T/open-uri20131225-35224-y57yx3>
irb(main):106:0> document = Nokogiri::HTML(html.read)

It produces the following following hard to read blob:

=> #<Nokogiri::HTML::Document:0x3ff87d83d7d0 name="document" children=[#<Nokogiri::XML::DTD:0x3ff87d83d2f8 name="html">, #<Nokogiri::XML::Element:0x3ff87d83cf10 name="html" attributes=[#<Nokogiri::XML::Attr:0x3ff87d83ce98 name="itemscope">, #<Nokogiri::XML::Attr:0x3ff87d83ce84 name="itemtype" value="http://schema.org/WebPage">] children=[#<Nokogiri::XML::Element:0x3ff87d83c77c name="head" children=[#<Nokogiri::XML::Element:0x3ff87d83c4c0 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d83c434 name="content" value="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for.">, #<Nokogiri::XML::Attr:0x3ff87d83c420 name="name" value="description">]>, #<Nokogiri::XML::Element:0x3ff87d83371c name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d8335b4 name="content" value="noodp">, #<Nokogiri::XML::Attr:0x3ff87d8335a0 name="name" value="robots">]>, #<Nokogiri::XML::Element:0x3ff87d8325c4 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3ff87d832510 name="itemprop" value="image">, #<Nokogiri::XML::Attr:0x3ff87d8324e8 name="content" value="/images/google_favicon_128.png">]>, #<Nokogiri::XML::Element:0x3ff87d82f964 name="title" children=[#<Nokogiri::XML::Text:0x3ff87d82f6d0 "Google">]>, #<Nokogiri::XML::Element:0x3ff87d82f478 name="script" children=[#<Nokogiri::XML::CDATA:0x3ff87d82f248 "(function(){\nwindow.google=
.....this goes on for awhile

Preferred output:

<Nokogiri::HTML::Document:0x3ff87d83d7d0 name="document" ...
  <Nokogiri::XML::Element:0x3ff87d83cf10 name="html"  ...
    <Nokogiri::XML::Attr:0x3ff87d83ce84 name="itemtype" value="http://schema.org/WebPage">] ...
    <Nokogiri::XML::Element:0x3ff87d82f964 name="title" ...
...

Thanks!

perseverance
  • 6,372
  • 12
  • 49
  • 68
  • 3
    Your preferred output is "Thanks!" ??? – tckmn Dec 25 '13 at 22:27
  • You'll need to override Nokogiri's `inspect` or `to_s` methods for Nokogiri::XML::NodeSet and/or Nokogiri::XML::Node. At that point you can make it look like anything you want. – the Tin Man Dec 29 '13 at 07:18

2 Answers2

4

You can use Nokogiri::HTML::Document#to_html to pretty print your Nokogiri::HTML::Document object.

Since Nokogiri::HTML::Document extends Nokogiri::XML::Document which extends Nokogiri::XML::Node, you have several other serializing options outputting to different formats using SaveOptions.

So do:

> document = Nokogiri::HTML(html.read)
> puts document.to_html
vee
  • 38,255
  • 7
  • 74
  • 78
  • I said I didn't want it in a html format because I want to see what nodes, attributes, etc in the Nokogiri::HTML::Document object. – perseverance Dec 25 '13 at 23:46
0

Use the awesome_print gem:

$ gem install awesome_print
$ irb

require 'open-uri'
require 'nokogiri'
require 'awesome_print'

html = open("http://www.google.com")
document = Nokogiri::HTML(html.read)

ap document

Unlike Nokogiri's to_html method, this also gives you indentation and syntax highlighting. It's not perfect, but much more usable than the default printout.

Arman H
  • 5,488
  • 10
  • 51
  • 76
  • 1
    This doesn't work. It still prints out a blob and doesn't indent. Have you tried this? I also said I tried this under the description of my question. – perseverance Dec 25 '13 at 23:44
  • I didn't understand what you're after until you edited your question with sample output. No, `awesome_print` wont' do that. It does provide syntax highlighting for HTML, and indentation for other Ruby objects (although not for Nokogiri). – Arman H Dec 26 '13 at 00:41