0

I have the following on my .htaccess file:

Options +FollowSymlinks
#+FollowSymLinks must be enabled for any rules to work, this is a security 
#requirement of the rewrite engine. Normally it's enabled in the root and we 
#shouldn't have to add it, but it doesn't hurt to do so.

RewriteEngine on
#Apache scans all incoming URL requests, checks for matches in our #.htaccess file 
#and rewrites those matching URLs to whatever we specify.

#allow blank referrers.
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?site.com [NC]
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?site.dev [NC]
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?dev.site.com [NC]
RewriteRule \.(jpg|jpeg|png|gif)$ - [NC,F,L]

# if a directory or a file exists, use it directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d


# otherwise forward it to index.php
RewriteRule . index.php

site.com is the production site.

site.dev is a localhost dev environment.

dev.site.com is a subdomain where we test live.

I'm aware that this will avoid the site to be indexed:

Header set X-Robots-Tag "noindex, nofollow"

cf. http://yoast.com/prevent-site-being-indexed/

My question is however, fairly simple perhaps:

Is there a way to apply this line ONLY on dev.site.com, so that it doesn't get indexed ?

MEM
  • 30,529
  • 42
  • 121
  • 191

1 Answers1

1

Is there a way to apply this line ONLY on dev.site.com, so that it doesn't get indexed ?

Yes, you need to put the Header line in the vhost config for dev.site.com. There's no way you can make a host check tied to a Header set directive from within an htaccess file.

The other possibility is if you want to block bots via useragent, you can remove the Header set and add some rules:

# request is for http://dev.site.com
RewriteCond %{HTTP_HOST} ^dev.site.com$ [NC]
# user-agent is a search engine bot
RewriteCond %{HTTP_USER_AGENT} (Googlebot|yahoo|msnbot) [NC]
# return forbidden
RewriteRule ^ - [L,F]

Note that the list of user agents isn't complete. You can try to go through the massive list of User-Agents and look for all of the index robots, or at least the more popular ones.

Jon Lin
  • 142,182
  • 29
  • 220
  • 220
  • On the link provided they say that, "your site is running on Apache, and mod_headers is enabled (it usually is), you could add the following single line to your .htaccess file" - I will try perhaps the second option, since I can't get access to the vhost file on a shared host env. – MEM Oct 17 '12 at 22:02
  • I've added your last code, no issues on htaccess. Good. I can still see the dev appearing while searching on google. Perhaps it will be a matter of time ? – MEM Oct 18 '12 at 16:53
  • @MEM Not sure how you go about removing indexed pages from google's database, you'll have to contact them about it. But my guess is that eventually, they'll be removed simply because their cached versions of your pages got old. – Jon Lin Oct 18 '12 at 18:22