1

Ok what I want to do is strip the contents of my html file, local for now and then out put it to a file. That part works but when I do it it takes out all the spacing, for example I have an H1 tag with content and a P tag, using the code below I run it, the stripped stuff is place d in the file but its on a single line, I want to to be broken in to multiple lines.

require "rubygems"
require "nokogiri"

my_html = open("./my_html.html")
File.open("./no_html.txt", "a+") do |file| 
 file.puts Nokogiri::HTML(my_html).text
end
Casey Clayton
  • 70
  • 1
  • 12
  • Look at this [question](http://stackoverflow.com/questions/1898829/how-do-i-pretty-print-html-with-nokogiri) . It might be what you are looking for. – Mircea Nov 22 '13 at 21:51

1 Answers1

1

If you want to split up the string which is returned from Nokogiri::HTML(my_html).text, you may use String#scan:

> "abcdefghijklmnpqrstuvwxyzfdsafadfasfadsfafdasfadfasdfasdfasdfdsf".scan(/.{5}/)
 => ["abcde", "fghij", "klmnp", "qrstu", "vwxyz", "fdsaf", "adfas", "fadsf", "afdas", "fadfa", "sdfas", "dfasd"]

If you want to beautify the HTML use

 Nokogiri::HTML(my_html,&:noblanks)

as is pointed out in the SO post @Mircea pointed out in the comments.

Kenny Meyer
  • 7,849
  • 6
  • 45
  • 66