1

We currently have a Squid Proxy server setup on our network, a high school. We are sending all traffic from student devices (iPads) through the proxy, both internally and externally. Traffic gets sent to our Firewall, and then from there, passed on to the proxy server. The proxy server then sends the traffic through our internet filter, which is also at the Firewall level.

The setup seems to work fine for all websites, except Google. When Google searching, I am redirected to ipv4.google.com when we are required to enter a CAPTCHA to continue searching. Google thinks that there is unusual activity when going through the proxy server.

Is there a way around this? Is there a better way to do our proxy so that Google doesn't see it as unusual activity? Another option would maybe be to not have any traffic from Google go through our proxy, but I'm not sure how to do this.

Any help would be appreciated.

Thanks!

Alex Brady
  • 11
  • 1
  • 3
  • How frequently does each individual device see the captcha? – kasperd Aug 14 '14 at 17:29
  • It seems to be only once per session. If they put the device to sleep and get back on, it asks for the captcha again. – Alex Brady Aug 14 '14 at 19:23
  • 1
    It's good that it is only once per session. That means Google isn't punishing legitimate users from your IP, as soon as it has the information to distinguish them from whatever traffic from your IP, it finds suspicious. But Google shouldn't be able to tell if the device has been sleeping between two requests. Are you sure it isn't connected to how much time has past rather than whether the device has been sleeping? – kasperd Aug 14 '14 at 19:42
  • You are right, I'm not sure exactly what factor is causing the device to have to enter the captcha again. It very well could be time, we will have to do some more testing to find out. – Alex Brady Aug 15 '14 at 13:11

3 Answers3

1

How to use the proxy for everything but Google

If you are hijacking the traffic to send it through a "transparent" proxy, then you could configure the device doing the hijacking to not hijack the IP ranges known to belong to Google. Then those requests won't go through the proxy.

If you are rather relying on a web proxy auto detection script, then you can update the script to consider the hostname before deciding on whether to use a proxy or direct connection.

However if those requests end up all going through a single NAT instead of all through a single proxy, Google may never know the difference. It would still see all the same requests coming from a single IP, which may look equally suspicious if done through NAT or proxy.

It may be the only way you could get a separate IP for each device is by using IPv6. Luckily Google supports IPv6, so if you do get IPv6 on your network, Google will be able to tell the difference between requests from different devices.

Other approaches

There may be a single or a few devices on your network, which are flooding Google with abusive requests. Try to track down the devices doing the most requests to Google, and figure out if they are legitimate. If you have a few machines infected by botnets, then cleaning them might solve the problem.

You could also double check if your proxy is sending all information to Google, which could help Google identify clients. If Google can tell which requests came from which device, they might only block the abuse devices and not everything going through your proxy. For http you could ensure all requests to Google get an X-Forwarded-For header. It might be Google would entirely ignore that header, you can really only find out by trying.

If the requests are done over https there is not much you can do. You can't help Google identify individual clients, and you can't figure out what requests are performed, only the volume. However the volume of requests might be enough to identify if there are any abusive devices on your network.

kasperd
  • 30,455
  • 17
  • 76
  • 124
1

Google normally doesn't allow proxies to be used for searching as it may impact their search results, I have tried it on several occasions but the same result.

This can be due to the reason that proxies may affect the ranking positions on SERP(s) or maybe they want that everybody should come direct without hiding oneself.

MadHatter
  • 79,770
  • 20
  • 184
  • 232
Asad
  • 11
  • 1
1

Google switched all their http traffic to https. What this means is that the X-Forwarded-For entry cannot be modified by a proxy server unless the proxy server is performing a Man In the Middle https interception and producing an unauthorized certificate claiming to be *.google.com.

Believing that Google would trust an ip-address listed in the X-Forwarded-For: header is not a bot is crazy. Any bot coder could simply code their bot to use a random ip-address in the X-Forwarded-For header and Google would just trust it was not a bot.

Believing that Google would trust that some proxy server that is hijacking the https session and spoofing the ssl/tls certificate is a trustworthy enough to say they are not a bot is crazy.

Has anyone actually demonstrated in 2015 that the Google Captcha stops happening after hijacking the https, inserting an X-Forwarded-For header, and spoofing the ssl/tls certificate?

Keith
  • 11
  • 1
  • There is a huge difference between trusting a header and using the header as a signal to identify bots. I do not know exactly which signals Google use to identify bots. But I am pretty sure Google tries to use all useful signals. And Google could very well have the data needed to evaluate the usefulness of using X-Forwarded-For as such a signal. I am not going to create a bot to hammer Google with abusive requests in order to find out whether Google would use the X-Forwarded-For header to distinguish between the bot and legitimate requests. – kasperd Dec 17 '18 at 11:46