how to tell if a web request is coming from google's crawler?

Question

From the HTTP server's perspective.

possible duplicate of [Verifying Googlebot in .htaccess file](http://stackoverflow.com/questions/22280631/verifying-googlebot-in-htaccess-file) — Shaun Dychko, Sep 16 '15 at 21:32

score 14 · Answer 1 · edited Jun 20 '20 at 09:12

You can read the official Verifying Googlebot page.

Quoting the page here:

You can verify that a bot accessing your server really is Googlebot (or another Google user-agent) by using a reverse DNS lookup, verifying that the name is in the googlebot.com domain, and then doing a forward DNS lookup using that googlebot name. This is useful if you're concerned that spammers or other troublemakers are accessing your site while claiming to be Googlebot.

For example:
> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer  crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
Google doesn't post a public list of IP addresses for webmasters to whitelist. This is because these IP address ranges can change, causing problems for any webmasters who have hard coded them. The best way to identify accesses by Googlebot is to use the user-agent (Googlebot).

Is there no way to query google.com or googlebot.com every so often using dns to get the list of ip or ip ranges? Doing this for every incoming request seems painful. Something like an mx record for A or AAAA records. — jjxtra, Oct 08 '21 at 19:42
@jjxtra I would implement this with caching. If you only look up the IP addresses that you haven't looked up recently, it works very well. — Stephen Ostermiller, Nov 05 '21 at 09:55

this. __curious_geek · Accepted Answer · 2010-07-22T12:25:34.917

I have captured google crawler request in my asp.net application and here's how the signature of the google crawler looks.

Requesting IP: 66.249.71.113
Client: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

My logs observe many different IPs for google crawler in 66.249.71.* range. All these IPs are geo-located at Mountain View, CA, USA.

A nice solution to check if the request is coming from Google crawler would be to verify the request to contain Googlebot and http://www.google.com/bot.html. As I said there are many IPs observed with the same requesting client, I'd not recommend to check on IPs. And may be that's where Client identity come into the picture. So go for verifying client identity.

Here's a sample code in C#.

    if (Request.UserAgent.ToLower().Contains("googlebot") || 
             Request.UserAgent.ToLower().Contains("google.com/bot.html"))
    {
        //Yes, it's google bot.
    }
    else
    {
        //No, it's something else.
    }

It's important to note that, any Http-client can easily fake this.

No, they're found to use wide-range of IPs all in `66.249.71.*` — this. __curious_geek, Jul 22 '10 at 12:15

score 1 · Answer 3 · answered Feb 07 '22 at 07:09

You can now perform an IP address check, by checking against googlebot's published IP address list at https://developers.google.com/search/apis/ipranges/googlebot.json

From the docs:

you can identify Googlebot by IP address by matching the crawler's IP address to the list of Googlebot IP addresses. For all other Google crawlers, match the crawler's IP address against the complete list of Google IP addresses.

score 0 · Answer 4 · answered Jul 22 '10 at 12:15

0

If you're using Apache Webserver, you could have a look at the log file 'log\access.log'.

Then load google's IPs from http://www.iplists.com/nw/google.txt and check whether one of the IPs is contained in your log.

answered Jul 22 '10 at 12:15

weberph

539
1
4
9

1

nope, this is not a reliable way to do this since client IPs can change. – this. __curious_geek Jul 22 '10 at 12:16

score 0 · Answer 5 · answered Oct 19 '21 at 08:43

0

Based on this. __curious_geek's solution, here's the javascript version:

if(window.navigator.userAgent.match(/googlebot|google\.com\/bot\.html/i)) {
  // Yes, it's google bot.
}

answered Oct 19 '21 at 08:43

Sam

5,375
2
45
54

1

Or it's someone pretending to be a Google bot. – reinierpost Feb 08 '22 at 15:12

how to tell if a web request is coming from google's crawler?

5 Answers5

Linked

Related