Yahoo's robots.txt contains:
User-agent: *
Disallow: /p/
Disallow: /r/
Disallow: /*?
What does the last line mean? ("Disallow: /*?")
Yahoo's robots.txt contains:
User-agent: *
Disallow: /p/
Disallow: /r/
Disallow: /*?
What does the last line mean? ("Disallow: /*?")
If it was a Perl regular expression:
*? Match 0 or more times, not greedily
http://perldoc.perl.org/perlre.html
However robots.txt
follows a really basic grammar, as such,
To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:
User-agent: Googlebot Disallow: /private*/
To block access to all URLs that include a question mark (?) (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):
User-agent: Googlebot Disallow: /*?
To specify matching the end of a URL, use $. For instance, to block any URLs that end with .xls:
User-agent: Googlebot Disallow: /*.xls$
You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:
User-agent: * Allow: /*?$ Disallow: /*?
The Disallow: / *? directive will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).
The Allow: /*?$ directive will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).
So basically any kind of query or search on Yahoo! is prohibited by a robot.
The expression support is confusingly not listed in the RFC, http://www.robotstxt.org/norobots-rfc.txt
The best description is provided by Google, http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449