23

Let's say I have a test folder (test.domain.com) and I don't want the search engines to crawl in it, do I need to have a robots.txt in the test folder or can I just place a robots.txt in the root, then just disallow the test folder?

janw
  • 8,758
  • 11
  • 40
  • 62
Pa3k.m
  • 994
  • 2
  • 7
  • 22

3 Answers3

33

Each subdomain is generally treated as a separate site and requires their own robots.txt file.

janw
  • 8,758
  • 11
  • 40
  • 62
malexander
  • 4,522
  • 1
  • 31
  • 37
  • Even if it's a virtual subdomain like the one ipage offers? – Pa3k.m Nov 28 '13 at 01:23
  • 3
    Likely, it depends on how its set up, but an easy way to tell is to go to your URL and see what file is accessed. http://subdomain.domain.com/robots.txt – malexander Nov 28 '13 at 01:27
3

When the crawler fetches test.domain.com/robots.txt that is the robots.txt file that it will see. It will not see any other robots.txt file.

randomusername
  • 7,927
  • 23
  • 50
1

If your test folder is configured as a virtual host, you need robots.txt in your test folder as well. (This is the most common usage). But if you move your web traffic from subdomain via .htaccess file, you could modify it to always use robots.txt from the root of your main domain.

Anyway - from my experience it's better to be safe than sorry and put (especially declining access) files robots.txt in all domains you need to protect. And double-check if you're getting the right file when accessing:

http://yourrootdomain.com/robots.txt
http://subdomain.yourrootdomain.com/robots.txt
Kleskowy
  • 2,648
  • 1
  • 16
  • 19
  • I have already thought of that but wont it conflict with each other? – Pa3k.m Nov 28 '13 at 01:22
  • It won't, as the crawler does not think about how the file is generated or where it comes from. It just calls the URL and fetches what is returned. – Kleskowy Nov 28 '13 at 11:14