1

I'm using robots.txt files to prevent crawlers crawling specific pages. When I want to prevent crawling everything in myfolder in /folder/myfolder/ can I put robots.txt in /folder/myfolder/robots.txt and write:

User-agent: *
Disallow: /

or will I have to put robots.txt in /robots.txt and set:

User-agent: *
Disallow: /folder/myfolder/

Who knows that?

  • Possible duplicate of [robots.txt allow all except few sub-directories](http://stackoverflow.com/questions/28495972/robots-txt-allow-all-except-few-sub-directories) – unor Dec 05 '16 at 09:01

1 Answers1

0

The way robots.txt works is off of URL strings, so if you had a project that was 3 directories deep, like this:

Home/
  /directory/
    - file 1
    - file 2
    /directory2/
       - file 3

Putting in this:

User-agent: *
Disallow: /

Will prevent crawling on any url that is www.yoursite.com/ (aka your whole site)


Putting in something like this:

User-agent: *
Disallow: /directory1/

Will prevent crawling any of the sites/directories that exist in your directory1 folder. So in our example, file 1,file 2, and directory 2 will not get crawled.


As far as where you can place it, I always placed it in my home directory, the same place you put your index.html file.

knocked loose
  • 3,142
  • 2
  • 25
  • 46
  • Shouldn’t contain `www.yoursite.com` unless it’s actually part of the URL path (e.g., in a URL like `http://example.com/www.yoursite.com/directory1/`). – unor Dec 05 '16 at 09:00
  • 1
    @unor fixed, we use a software that must remove that if it's placed in the field. Thanks for the notice! – knocked loose Dec 05 '16 at 13:57