3

If I request the following URL

http://www.google.com/recaptcha/api/noscript?k=MYPUBLICKEY

I will get old no-script version of captcha, containing image of Google street number, like this

enter image description here

But if I'll do the same with HtmlUnit I will get some faked version of image, like this:

enter image description here

It happens all the time: real-world street number from browser and blackish distorted text from HtmlUnit. Public key is the same.

How can Google server distinguish between browser and HtmlUnit?

The HtmlUnit code is follows:

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
final HtmlPage page = webClient.getPage("http://www.google.com/recaptcha/api/noscript?k=" + getPublicKey());
HtmlImage image = page.<HtmlImage>getFirstByXPath("//img");
ImageReader imageReader = image.getImageReader();

Process is observable with Fiddler.

Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385

2 Answers2

0

And how about setting correct Headers for your request? User-Agent is a key here.

Headers are the way that backend can get client information (Firefox, Chrome etc) and what is it in your case? Set correct headers eg. for Firefox:

        conn.setRequestProperty("User-Agent", " Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1) Gecko/20100101 Firefox/8.0.1");
        conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");

This snipped if from my code using Apache HttpClient, you need to adapt it to your needs.

Antoniossss
  • 31,590
  • 6
  • 57
  • 99
0

I know this is old post but, good way is to use

WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER);

How you solve your problem?

pg7812
  • 21
  • 5