0

I'm writing an application, in which the user needs to solve a recaptcha image outside of a browser, basically they would see the image from a page like this http://www.google.com/recaptcha/api/noscript?k=6Lf5YAcAAAAAAILdm73fp007vvmaaDpFb6A5HLJP, submit the solution and the program would do the rest.

I am using mechanize to automate interaction with that page and for some reason it always gives me a 500 response code. I've tried setting the useragent with mechanize to no avail. I'm sort of at a loss about what I should do, because I've inspected (with wireshark) the packets that mechanize is sending/recieving and compared them to the ones when I use chrome to get the code, and they look nearly the same! (It's a post message and they have the same params, etc and are posting to the same place)

I'm pretty sure it's something obvious, but still I've been battling it for hours and would appreciate some help.

EDIT for code

  agent = Mechanize.new do |i|
    i.user_agent_alias = 'Mac Safari'
    i.log = Logger.new 'captcha.log'
  end
  agent.get(captcha_url) do |google_page| # captcha_url is a url like the one above
      form = google_page.forms.first
      form.recaptcha_response_field = captcha_text # captcha_text is user-input
      form.
      form.submit # this line is where the error is
  end
undur_gongor
  • 15,657
  • 5
  • 63
  • 75
HRÓÐÓLFR
  • 5,842
  • 5
  • 32
  • 35

3 Answers3

1

Nearly the same? Try sending the exact same headers.

agent = Mechanize.new

headers = {
    "Content-Type" => "application/x-www-form-urlencoded",
    "User-Agent" => "MyAgent",
    "Referer" => "Bob"
}

agent.post(url, {:foo => 'bar'}, headers)

If that doesn't work, take a look at cookies.

pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • Hmm, that didn't work either. What sort of thing would i look at as far as cookies go? – HRÓÐÓLFR Nov 18 '11 at 09:23
  • That they're the same in your script request as in your chrome request. The point is to make the script request identical to the chrome request. Wireshark isn't very friendly you might try fiddler or charles proxy to examine them side by side. – pguardiario Nov 18 '11 at 22:57
0

They might do some kind of browser-recognition aside from checking user-agent, like check the order of the data in the header. But that's just speculation. Code sample will help.

Reactormonk
  • 21,472
  • 14
  • 74
  • 123
0

Well I've solved the problem. Apparently mechanize was sending an incorrect 'Content-length' in the header. Doing the request manually (with post from net/http) makes it work.

HRÓÐÓLFR
  • 5,842
  • 5
  • 32
  • 35