4

I have a website saying :

http://domain.com/

mirror site on

http://cdn.domain.com/

I don't want cdn to be indexed. How can I write robots.txt rule to avoid the cdn from being indexed without disturbing my present robots.txt excludes.

My present robots.txt excludes :

User-agent: *
Disallow: /abc.php

How can I avoid cdn.domain.com from being indexed ?

User-agent: *
Disallow: /abc.php
Sumit Bijvani
  • 8,154
  • 17
  • 50
  • 82
Yugal Jindle
  • 44,057
  • 43
  • 129
  • 197

2 Answers2

10

in your root .htaccess file add the following

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt

And then create a separate robots-cdn.txt:

User-agent: *
Disallow: /

When accessed through via http://cdn.domain.com/robots.txt will return the contents of the robots-cdn.txt file... otherwise the rewrite won't kick in and the true robots.txt will kick in.

This way you are free to mirror the entire site (including the .htaccess) file with the expected behavior

Update :

  • HTTP_USER_AGENT did it since Amazon uses it while querying it from any location.
  • I have verified and it works
Yugal Jindle
  • 44,057
  • 43
  • 129
  • 197
Orangepill
  • 24,500
  • 3
  • 42
  • 63
  • Sir, I don't think this will work. Since cdn is not a machine and hence if I ask robots from cdn, it will inturn ask it from domain.com as a normal browser client. So, we will get the same robots.txt (that we don't want) – Yugal Jindle Jun 06 '13 at 07:34
  • http_host variable isn't referring to a network host. its referring to the host portion of a url. you can view the same variable in php an the $_SERVER[HTTP_HOST] superglobal – Orangepill Jun 06 '13 at 07:47
  • You were right, although `HTTP_USER_AGENT` worked instead for `amazon`. I have made the required changes in the answer. Thanks. – Yugal Jindle Jun 06 '13 at 09:24
0

If the codebase are the same, you can generate your robots.txt dynamically and change its content depending on the requested (sub)domain.

ZeWaren
  • 3,978
  • 2
  • 20
  • 21