Avoid or block all load balanced sites from being crawled

Question

We have an Umbraco site in a load balanced environment and we need to make sure only the actual URL gets crawled and not the different production URLs.

We only want example.com to be indexed while load balancers at production1.example.com and production2.example.com are not.

Do I add a disallow rule for those URLs to the robots.txt, or add a meta nofollow tag to the head? Or is there another way to have the load balancing URLs not indexed by crawlers?

score 0 · Accepted Answer · answered May 18 '12 at 06:39

0

Best solution: Don't make node-specific URLs publicly available (we usually use local ip/port to check a site on a specific node).

Since you have those domains, you may serve a different robots.txt depending on the domain (using URL rewriting).

answered May 18 '12 at 06:39

marapet

54,856
12
170
184

The client unfortunately wants those URLs publicly available, but I'll give you the answer anyway. – Ingen Speciell Aug 07 '12 at 01:14

Avoid or block all load balanced sites from being crawled

1 Answers1