How google server can distinguish between browser and HtmlUnit?

Question

If I request the following URL

http://www.google.com/recaptcha/api/noscript?k=MYPUBLICKEY

I will get old no-script version of captcha, containing image of Google street number, like this

enter image description here

But if I'll do the same with HtmlUnit I will get some faked version of image, like this:

enter image description here

It happens all the time: real-world street number from browser and blackish distorted text from HtmlUnit. Public key is the same.

How can Google server distinguish between browser and HtmlUnit?

The HtmlUnit code is follows:

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
final HtmlPage page = webClient.getPage("http://www.google.com/recaptcha/api/noscript?k=" + getPublicKey());
HtmlImage image = page.<HtmlImage>getFirstByXPath("//img");
ImageReader imageReader = image.getImageReader();

Process is observable with Fiddler.

Most likely the User-Agent http header. – Erwin Bolwidt Apr 02 '15 at 12:16 — Erwin Bolwidt, Apr 02 '15 at 12:16

score 0 · Answer 1 · answered Apr 02 '15 at 12:15

And how about setting correct Headers for your request? User-Agent is a key here.

Headers are the way that backend can get client information (Firefox, Chrome etc) and what is it in your case? Set correct headers eg. for Firefox:

        conn.setRequestProperty("User-Agent", " Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1) Gecko/20100101 Firefox/8.0.1");
        conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");

This snipped if from my code using Apache HttpClient, you need to adapt it to your needs.

score 0 · Answer 2 · answered Aug 01 '16 at 11:49

0

I know this is old post but, good way is to use

WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER);

How you solve your problem?

answered Aug 01 '16 at 11:49

pg7812

21
5

How google server can distinguish between browser and HtmlUnit?

2 Answers2