0

I have some code that does basically this, where the var urls is an array of strings. This is a distilled version but should show the point.

require 'rubygems'
require 'typhoeus'
require 'json'
require 'socket'
def hit_http_urls(urls)
  hydra = Typhoeus::Hydra.new
  hydra.disable_memoization

  urls.each do |url|
    req = Typhoeus::Request.new(url,
          :disable_ssl_peer_verification => true,
          :disable_ssl_host_verification => true,
          :ssl_version => :sslv3,
          :headers=>{'User-Agent' => 'athingy', 'Content-Type' => 'text/xml; charset=utf-8'},
          :timeout => 10)
    req.on_complete { |res|
      puts res.body.length
    }
    hydra.queue(req)
  end
  hydra.run
end

The problem is that one (or more) of the urls can have a response in the megabytes. Since this function will be run in a loop with mostly the same group of urls, I don't want this. Is there a way to stop receiving data after a hard limit somehow? Like a :max_response_size or something?

I've looked at the rubydocs on hydra/typhoeus: http://rubydoc.info/github/dbalatero/typhoeus/master/Typhoeus/Hydra

http://rubydoc.info/github/dbalatero/typhoeus/master/Typhoeus/Request

http://rubydoc.info/gems/typhoeus/0.4.1/file/README.md

but they don't seem to tell me a way to limit the response body size. Is this possible?

rdickeyvii
  • 171
  • 1
  • 6

1 Answers1

1

Its not possible with Typhoeus at the moment, but with Ethon. In this Gist I demonstrate howto provide another receiver for the response_body - a file handle in that case (Line 12). You could provide a String like object which cannot receive more than X bytes instead.

With Ethon you don't have access to the comfort from Typhoeus, but its worth to go down in some cases.

i0rek
  • 354
  • 1
  • 5