0

I'm trying to figure out how to get the size of a Nokogiri XML document. The document is being fetched with open:

 Nokogiri::XML(open(my_url))

Is it possible at this point to determine the size of the returned document? Are any HTTP headers retrievable such as Content-type or Content-Length?

randombits
  • 47,058
  • 76
  • 251
  • 433

2 Answers2

2

Not with Nokogiri directly. I you want to know the content-length before downloading the file you can do something like this:

response = http.request_head('http://www.example.com/file.ext')
file_size = response['content-length']

The file_size will be in bytes.

dierre
  • 7,140
  • 12
  • 75
  • 120
  • both ``response['content-length']`` and ``response['Content-Length']`` come up empty. any idea why? ``response['content-type']`` works just fine. – randombits Aug 14 '11 at 17:54
  • http://stackoverflow.com/questions/4811829/use-ruby-to-get-content-length-of-urls that's probably your problem. – dierre Aug 14 '11 at 18:13
2

You could try something like:

opened_url = open(my_url)
opened_url.size # Gets size
doc = Nokogiri::XML(open(my_url))

When checking the "size" of a string you might want to keep in mind the discussion from this post: http://zargony.com/2009/07/24/ruby-1-9-and-file-encodings

Mario Zigliotto
  • 8,315
  • 7
  • 52
  • 71