6

Is it possible for google or any other crawler to crawl and index a page which returns a 301 status code?

I have seen a page in google, which has had a 301 for months. However the cache date of that page in the index is from a few days ago.

Can google just ignore the 301 and crawl the contents of a page?

user1721135
  • 6,864
  • 8
  • 34
  • 63
  • 1
    You cannot force google to ignore the 301 http://webmasters.stackexchange.com/questions/34807/how-can-i-force-google-to-re-index-my-site – Fabien Sa Oct 26 '13 at 14:57
  • The question is if google can ignore it on their own, i dont want them to ignore it. – user1721135 Oct 26 '13 at 14:58

5 Answers5

6

Normally Google crawls the page that's redirected to. Two possible explanations for the site you saw:

  • The site just showed a 301 message instead of returning HTTP-headers properly.
  • The site redirected to another 301, which redirected to another 301, ...

Watch this video on Youtube.

Timmy D'Hooghe
  • 154
  • 1
  • 7
3

Google always crawls the target of a redirect, HTTP 301 is not an exception. Could not find a better source than one employee's discussion post, though. Google Search Appliance documentation says the same and I don't see why GSA and GoogleBot should handle redirects differently.

Palec
  • 12,743
  • 8
  • 69
  • 138
  • That sounds plausible. But would it possible not to follow the 301 but crawl the page instead (not the target)? Because the cache date says they crawled the page even if it has a 301. – user1721135 Oct 29 '13 at 07:34
  • 2
    @user1721135 From the HTTP's point of view, there is no technical need to follow a redirect. The standard says that [clients with link editing capability ought to automatically re-link](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2). Notice that “ought to” is not normative. However “Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).” is clearly normative, SHOULD is strong recommendation defined in [RFC 2119](http://www.ietf.org/rfc/rfc2119.txt). Why to index a virtually empty page? – Palec Oct 29 '13 at 12:49
  • @user1721135 What I'm trying to say is: Google doesn't publish such a detailed info. Observed behavior could be a bug. It clearly has something to do with caching, where many performance tweaks are involved. Yes, I think it is possible that Google indexes old URL even after crawling it and realizing it is a 301 redirect. It is possible in the same way it is possible you own a cat. If you've never written about it on the net there is no evidence, one could only guess your motivation. – Palec Oct 29 '13 at 13:00
2

Google visits URLs forever irrespective of what response code you return. They do this just in case a URL ever comes back to life with real content.

The 301 is the best response. Google will drop those URLs from the SERPs eventually. Don't force a quicker drop unless you want less visitors to your site for the next three to six months.

Shivanshu
  • 1,230
  • 1
  • 11
  • 17
2

According to Matt Cutts, the head of the webspam team, people have used 301s to abuse rankings by forwarding a bunch of domains to a new one and thus Google has improved how they handle 301 pages. Let us say you moved to a new domain and 301d all of your pages from old domain to respective pages on the new domain. In this case, Google will eventually phase out the old domain from index and bring the new one in.

What you are saying is rare and if you are worried about it you can let Google know about it via Google Webmaster Forums. They are pretty quick at things like this once it gets someone's attention. There could, however, be the reason that the page eventually removes 301 and then puts it back on. Or it could be that the 301 is not shown to Google Bot.

dinwal
  • 467
  • 2
  • 14
2

You can use the google webmaster tool: https://www.google.com/webmasters/tools/home

There is a robots-analysis tool where you can test your domain url's and see for yourself if a 301 redirected page is being crawled or not ;)

gelleby
  • 108
  • 8