2

I am trying to download a binary file via HTTP using the following Ruby script.

#!/usr/bin/env ruby
require 'net/http'
require 'uri'

def http_download(resource, filename, debug = false)
  uri = URI.parse(resource)
  puts "Starting HTTP download for: #{uri}"
  http_object = Net::HTTP.new(uri.host, uri.port)
  http_object.use_ssl = true if uri.scheme == 'https'
  begin
    http_object.start do |http|
      request = Net::HTTP::Get.new uri.request_uri
      Net::HTTP.get_print(uri) if debug
      http.read_timeout = 500
      http.request request do |response|
        open filename, 'w' do |io|
          response.read_body do |chunk|
            io.write chunk
          end
        end
      end
    end
  rescue Exception => e
    puts "=> Exception: '#{e}'. Skipping download."
    return
  end
  puts "Stored download as #{filename}."
end

However it downloads the HTML source instead of the binary. When I enter the URL in the browser the binary file is downloaded. Here is a URL with which the script fails:

http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML

I execute the script as follows

pry> require 'myscript'
pry> resource = "http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML"
pry> http_download(resource,"StreetTreePt.KML", true)

How can I download the binary?

Redirection experiments

I found this redirection check which looks quite reasonable. When I integrate in the response block it fails with the following error:

Exception: 'undefined method `host' for "save_download.asp?filename=StreetTreePt.KML":String'. Skipping download.

The exception does not occur in the "original" function posted above.

JJD
  • 50,076
  • 60
  • 203
  • 339
  • try replacing `w` with `wb` in this line `open filename, 'w' do |io|` – orde May 17 '13 at 19:51
  • 1
    @orde No change. I do not think it's about storing the data in binary mode. The problem seems to be that the request is redirected. – JJD May 17 '13 at 20:03
  • 1
    You need to show how you've incorporated the redirection. The example code for Net::HTTP works, so the problem is in your code. – the Tin Man May 17 '13 at 20:31
  • @theTinMan I added the imports and how I execute the script. - I am confused! Did you run my script and it works for you? – JJD May 17 '13 at 20:48

1 Answers1

3

The documentation for Net::HTTP shows how to handle redirects:

Following Redirection

Each Net::HTTPResponse object belongs to a class for its response code.

For example, all 2XX responses are instances of a Net::HTTPSuccess subclass, a 3XX response is an instance of a Net::HTTPRedirection subclass and a 200 response is an instance of the Net::HTTPOK class. For details of response classes, see the section “HTTP Response Classes” below.

Using a case statement you can handle various types of responses properly:

def fetch(uri_str, limit = 10)
  # You should choose a better exception.
  raise ArgumentError, 'too many HTTP redirects' if limit == 0

  response = Net::HTTP.get_response(URI(uri_str))

  case response
  when Net::HTTPSuccess then
    response
  when Net::HTTPRedirection then
    location = response['location']
    warn "redirected to #{location}"
    fetch(location, limit - 1)
  else
    response.value
  end
end

print fetch('http://www.ruby-lang.org')

Or, you can use Ruby's OpenURI, which handles it automatically. Or, the Curb gem will do it. Probably Typhoeus and HTTPClient too.

According to the code you show in your question, the exception you are getting can only come from:

http_object = Net::HTTP.new(uri.host, uri.port)

which is hardly likely since uri is a URI object. You need to show the complete code if you want help with that problem.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Running the function I posted above does **not raise an exception**. Simply the HTML file is downloaded instead of the binary file. The exception mentioned in my post relates to the redirect experiments. I updated my post to clarify this fact. – JJD May 17 '13 at 22:14
  • @theTinMan I successfully tried OpenURI as you suggested. Nevertheless, I am still interested how to handle the redirect in my example. – JJD May 18 '13 at 21:21
  • The example given by the Net::HTTP docs for handling redirection works correctly so use that. – the Tin Man May 18 '13 at 22:10