0

I'm trying to scrape some data using the mechanize library in ruby and I have to first get past a "Terms and Conditions" page. To that end I'm clicking an "I agree" button.

require 'mechanize'

agent = Mechanize.new
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
agent.get('https://apply.hobartcity.com.au/Common/Common/terms.aspx')

form = agent.page.form_with(:id => 'aspnetForm')
button = form.button_with(:name => 'ctl00$ctMain$BtnAgree')
page = form.submit(button)

But when I run the above code I get this error on the form submission step:

Uncaught exception: unsupported content-encoding: gzip,gzip

When I access that second page with a browser the response headers are

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
X-UA-Compatible: IE=9,10,11
Date: Tue, 16 Feb 2016 22:44:27 GMT
Cteonnt-Length: 16529
Content-Encoding: gzip
Content-Length: 5436

I assume mechanize can work with gzip content encoding, so I'm not sure where the error is coming from. Any ideas what's going on here?

Ruby 2.1.7, mechanize 2.7.4.

Hugh Stimson
  • 33
  • 1
  • 11

1 Answers1

0

I didn't figure out what the actual cause of the problem was, but I was able to work around it by overriding the content-encoding:

agent.content_encoding_hooks << lambda { |httpagent, uri, response, body_io|
  response['Content-Encoding'] = 'gzip'
}
agent.submit(form, button)

No more error.

Hugh Stimson
  • 33
  • 1
  • 11