1

I'm a bit vague on the precise syntax of robots.txt, but what I'm trying to achieve is:

  • Tell all user agents not to crawl certain pages
  • Tell certain user agents not to crawl anything

(basically, some pages with enormous amounts of data should never be crawled; and some voracious but useless search engines, e.g. Cuil, should never crawl anything)

If I do something like this:

User-agent: *
Disallow: /path/page1.aspx
Disallow: /path/page2.aspx
Disallow: /path/page3.aspx

User-agent: twiceler
Disallow: /

..will it flow through as expected, with all user agents matching the first rule and skipping page1, page2 and page3; and twiceler matching the second rule and skipping everything?

Carson63000
  • 111
  • 3

2 Answers2

1

It would appear that you have a better understanding than you realize. :)

Justin Scott
  • 8,798
  • 1
  • 28
  • 39
0

Hmm, depends on the crawler and whether it just goes on first match basis. IE twiceler might see the wildcard entry first and not check any further, so would not see the Disallow: /

user45348
  • 169
  • 3