I've seen plenty of robot.txt stuff, and some mod-rewrite solutions that looked promising… but haven't been able to find a simple solution to block Spiders / Scrapers / whoever the hell I want to block… I'd rather do this by hostname / domain, as it seems simpler than relying on user-agents, etc…
For example, say I were to see this in Apache logs..
msnbot-207-46-192-48.search.msn.com - - [07/Dec/2011:23:01:41 -0500] "GET /%3f/$/bluebox/blog/2011/iphoto/ HTTP/1.1" 404 366
ok… I want to prevent *.search.msn.com
from ever coming here, or any of my sites - in any of my folders - VHOST or otherwise…
Typically, I have MANY <VirtualHost *.80>
's setup, and DO NOT want to have to repeat the config for each host.. In that same vein, I have many DocumentRoot
's… and putting some file in each of them, aka .htaccess
really isn't an option..
I had been using something in httpd.conf
that resembled…
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
RewriteRule ^(.*)$ http://go.away/`
How can i use the hostnames provided by to blanket-UseCanonicalName On
Deny all
any domain I so desired?