0

We have an Umbraco website which has several sub-domains and we want to exclude one of them from being crawled in search engines for now. I tried to change my Robots.txt file but seems I am not doing it right.

Url: http://mywebsite.co.dl/

subdomain: http://sub1.mywebsite.co.dl/

My Robots.txt content is as follow:

User-agent: *
Disallow: sub1.*

What I have missed?

tshepang
  • 12,111
  • 21
  • 91
  • 136
amir moradifard
  • 353
  • 1
  • 8
  • 26

2 Answers2

2

The following code will block http://sub1.mywebsite.co.dl. from being indexed:

User-agent: *
Disallow: /sub1/ 

You can also add another robots.txt file in the sub1 folder with the following code:

User-agent: *
Disallow: /

and that should help as well.

Howli
  • 12,291
  • 19
  • 47
  • 72
  • n Umbraco subdomains do not have separate folders. You can define as many as HostNames you want in Umbraco backend and you would have several subdomains for your website. – amir moradifard Mar 10 '14 at 11:49
0

If you want to block anything on http://sub1.mywebsite.co.dl/, your robots.txt MUST be accessible on http://sub1.mywebsite.co.dl/robots.txt.

This robots.txt will block all URLs for all supporting bots:

User-agent: *
Disallow: /
Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360
  • In Umbraco subdomains do not have separate folders. You can define as many as HostNames you want in Umbraco backend and you would have several subdomains for your website. – amir moradifard Mar 10 '14 at 11:49
  • @amirmoradifard: It doesn’t matter how it’s implemented on the backend. The only thing that matters is the URL that gets used by visitors. So if someone visits a page accessible on `sub.example.com`, the robots.txt **must** be accessible from `sub.example.com/robots.txt`. No other place will work (but you may redirect, I guess). – unor Mar 10 '14 at 12:40
  • It does matter. As far as whole website contains one main root and one robots.txt. Howlin's answer worked for me. – amir moradifard Mar 12 '14 at 10:45
  • @amirmoradifard: Howlin’s first code will not work. Disallowing `/sub1/` does *not* block `http://sub1.mywebsite.co.dl/` (it would block `http://sub1.mywebsite.co.dl/sub1/` and anything after that). – unor Mar 12 '14 at 13:24
  • I used his suggestion and manipulated it to : Disallow: sub1.mywebsite.co.dl/* and yes its working. – amir moradifard Mar 12 '14 at 13:27
  • @amirmoradifard: It’s working where? This is not valid according to the robots.txt specification. `Disallow` can not contain domains, it contains (beginnings of) URL paths. Your code would block `http://sub1.mywebsite.co.dl/sub1.mywebsite.co.dl/`. – unor Mar 12 '14 at 13:55
  • Yes you are right IF you use Disallow: /sub1.mywebsite.co.dl/* but by getting rid of first slash, it works. At least on my case its working! – amir moradifard Mar 12 '14 at 14:21
  • @amirmoradifard: May I ask how you know that it’s working? Even without the beginning slash, it would still be the URL **path**, not the host. So this should not work with any conforming bot/parser. – unor Mar 13 '14 at 13:20