2

What are considered the best practices when putting your site in maintenance during an update? I'm asking because i'm not very fond of having a site with over 60k indexed pages in google throwing a 404 header actually telling Google that the sites has disappeared. I'd rather tell google that the site is gone for a few hours so the googlebot should come back after a few hours and do nothing for now.

Just found this blogpost on the official Google webmaster blog: http://googlewebmastercentral.blogspot.com/2011/01/how-to-deal-with-planned-site-downtime.html, straight from the source!

Joshua
  • 40,822
  • 8
  • 72
  • 132
ChrisR
  • 14,370
  • 16
  • 70
  • 107
  • 1
    You know, you could accept one of the correct answers. ;-) – Ryan Chouinard Mar 04 '11 at 15:47
  • @RyanChouinard: Hold your horses ... i'm still pondering over which of you two gets my seal of approval :) – ChrisR Mar 04 '11 at 19:27
  • 2
    I voted to close this question because it is not a programming question and it is off-topic on Stack Overflow. Non-programming questions about your website should be asked on [webmasters.se]. In the future, please ask questions like this there. – Stephen Ostermiller Aug 20 '22 at 09:32

3 Answers3

5

Redirect 307 (or sending them back a maintenance page with a 503 code) to your site-down page will cause googlebot to come back later:

http://www.ivankristianto.com/web-development/programming/enable-maintenance-mode-with-htaccess/1619/

James T
  • 3,292
  • 8
  • 40
  • 70
  • 1
    307 is a HTTP/1.1 extension, but is the more appropriate one. However Google will probably treat 307 and 302 identically. – jishi Mar 04 '11 at 15:12
  • Would it be advisable to actually redirect to the maintenance page with a 307 or just display a maintenance page at the actually requested page and send the 503 Service Unavailable headers? I'm currently doing the latter. – ChrisR Mar 04 '11 at 15:14
  • 1
    If you have users or automated processes making POST requests to the server during the maintenance outage, a 307 may cause confusion as the redirect behavior is only defined for GET and HEAD requests. – Ryan Chouinard Mar 04 '11 at 15:18
  • 1
    To address Bubby4j, it should be noted that Google honors most status codes according to the RFC. A 503 is a temporary outage, and would be treated as such. It will not cause a delisting. I've used that code on several projects. – Ryan Chouinard Mar 04 '11 at 15:21
3

It should be acceptable to use a rewrite or other redirect to push all traffic to a maintenance page which returns a status 503 - Service Unavailable. From the W3, a 503 should be used when:

The server is currently unable to handle the request due to a temporary overloading or maintenance of the server.

See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.4 for more information about the 503 status code.

Ryan Chouinard
  • 2,076
  • 16
  • 11
-2

You can do either of the following:

Add

Disallow: /

in robots.txt during the update. This tells the bot to not index anything right now. However, beware of the risk for it to start dropping pages in the index. I however don't think it would do that, but I'm not sure.

Use a rewriterule that catches all requests and do a 302 Moved Temporarily to your maintanance page. This is probably the safest bet.

jishi
  • 24,126
  • 6
  • 49
  • 75
  • I have to think the Disallow is a dangerous, dangerous idea. Why WOULDN'T it start dropping you? – Scott Stafford Mar 04 '11 at 15:12
  • Bad idea ... i've experienced a whole domain disappearing from google for over 2 days after having `Disallow: /` in my robots.txt for only half an hour. My boss wasn't happy back then :) – ChrisR Mar 04 '11 at 15:12
  • I agree that the second option is the most safe bet, Would have explained that better. However, I read an article from google where they claim that they wont drop pages directly unless there is a revocation request pending for it. I'll try look it up. – jishi Mar 04 '11 at 15:14
  • They probably temporary remove it from the index. I'm inclined to believe that because after 2 days the whole site with all indexed pages reappeared in an instant. – ChrisR Mar 04 '11 at 15:16
  • http://www.google.com/support/forum/p/Webmasters/thread?tid=3cc824aecb39aac1&hl=en this isn't the exact article I read but it's interesting how they say "FYI robots.txt controls crawling, not indexing, so even if you have Disallow: / it's possible for us to index some URLs (without crawling them), for example if a bunch of people linked to that URL" – jishi Mar 04 '11 at 15:20
  • @jishi: I removed the -1 I gave it thanks to your followup, though I would still never Disallow my livelihood. ;) – Scott Stafford Mar 04 '11 at 15:25
  • I agree with you, however sometimes you don't have the ability to actually set up redirecting rules but you can modify robots.txt, then it might be the lesser good alternative. – jishi Mar 04 '11 at 15:35