8

I have:

  • domain.com
  • testing.domain.com

I want domain.com to be crawled and indexed by search engines, but not testing.domain.com

The testing domain and main domain share the same SVN repository, so I'm not sure if separate robots.txt files would work...

LazyOne
  • 158,824
  • 45
  • 388
  • 391
Eric
  • 429
  • 2
  • 6
  • 15

2 Answers2

11

1) Create separate robots.txt file (name it robots_testing.txt, for example).

2) Add this rule into your .htaccess in website root folder:

RewriteCond %{HTTP_HOST} =testing.example.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]

It will rewrite (internal redirect) any request for robots.txt to robots_testing.txt IF domain name = testing.example.com.

Alternatively, do opposite -- rewrite all requests for robots.txt to robots_disabled.txt for all domains except example.com:

RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^robots\.txt$ /robots_disabled.txt [L]
LazyOne
  • 158,824
  • 45
  • 388
  • 391
  • 1
    what about the www version of testing.example.com - do i need a separate rule for that? – Eric Jul 19 '11 at 00:05
2

testing.domain.com should have it own robots.txt file as follows

User-agent: *
Disallow: /

User-agent: Googlebot
Noindex: /

located at http://testing.domain.com/robots.txt
This will disallow all bot user-agents and as google looks at the Noindex as well we'll just though it in for good measure.

You could also add your sub domain to webmaster tools - block by robots.txt and submit a site removal (though this will be for google only). For some more info have a look at http://googlewebmastercentral.blogspot.com/2010/03/url-removal-explained-part-i-urls.html

Stephan
  • 1,505
  • 1
  • 11
  • 19
  • 1
    The files for both the testing and main domain are stored in the same SVN repository, and having separate robots.txt files would be pretty inconvenient...Any workaround to this? – Eric Jul 18 '11 at 21:42
  • Environment variables – matanster Oct 04 '13 at 13:20