1

How do different search bots interpret the * character in the disallow statement of the robots.txt file? Do all of them treat it as "none, one or more than one character" ?

Let's take the following example:

User-agent: *           
Disallow: /back-end*/*

What does the above code mean? Does it mean that any directory that has "back-end" won't be indexed, even if the word "back-end" is followed by any set of characters? And how about the * after the / ? Is it a good convention to write it?

Generally speaking, my question about the usage of the * in the disallow statement and if all search engine crawlers treats it the same way.

apaul
  • 16,092
  • 8
  • 47
  • 82
CompilingCyborg
  • 4,760
  • 13
  • 44
  • 61

1 Answers1

0

the Robot Exclusion Standard does not mention anything about the * character in the Disallow: statement. Some crawlers like Googlebot and Slurp recognize strings containing * while MSNbot and Teoma interpret it in different ways.

Michael Brown
  • 498
  • 4
  • 13
  • Alright! Thanks. What are those different ways? and what is really the usage of the * character? – CompilingCyborg Aug 27 '12 at 13:55
  • See the wikipedia article for more. http://en.wikipedia.org/wiki/Robots_exclusion_standard – TheSteve Aug 27 '12 at 14:00
  • 1
    the * was designed for greedy selection of 'allow'. Search engines (indexes in general) are to provide entry locations to your information, to disallow something normally occurs when a specific destination is known. Some info on differences are here ghita.org/search-engines-dynamic-content-issues.html – Michael Brown Aug 27 '12 at 14:02
  • This was helpful. So generally speaking from what i understood, it is not recommended to use the * character as it is not standardized in the disallow statement. Or what do you think? And something else, what do you think about the use of * after / ? as in: /back-end/* – CompilingCyborg Aug 27 '12 at 14:06
  • @CompilingCyborg: It's not needed, because `/back-end` already blocks all pages with URLs **starting with** "/back-end", so "/back-end-foobar" and "/back-end/foobar" and "/back-end.html" are blocked, too. – unor Oct 06 '12 at 05:43