Searching for specific information on the robots.txt
, I stumbled upon a Yandex help page‡ on this topic. It suggests that I could use the Host
directive to tell crawlers my preferred mirror domain:
User-Agent: *
Disallow: /dir/
Host: www.example.com
Also, the Wikipedia article states that Google too understands the Host
directive, but there wasn’t much (i.e. none) information.
At robotstxt.org, I didn’t find anything on Host
(or Crawl-delay
as stated on Wikipedia).
- Is it encouraged to use the
Host
directive at all? - Are there any resources at Google on this
robots.txt
specific? - How is compatibility with other crawlers?
‡ At least since the beginning of 2021, the linked entry does not deal with the directive in question any longer.