1

I know in order to show a directory listing of my files in a browser through .htaccess I can use

Options +Indexes

and to prevent Google and most bots from crawling my directory I can use

Options -Indexes

Is it possible to still allow a visible directory listing through a browser but prevent bot crawling/indexing solely with .htaccess?

RCNeil
  • 615
  • 3
  • 9
  • 17

1 Answers1

3

Your .htaccess file cannot magically distinguish "real" users from "bot" users. Since from the webserver's perspective, there is no distinction.

However, as a general rule, bots will respect the contents of robots.txt, while web browsers do not.

Alternately, if you had some way of determining what was a bot and what was not, you could work that rule into your .htaccess configuration. A common tactic is to apply a set of RewriteRules that filter based on the reported User-Agent header. For example, a user-agent that contains the word "googlebot" is probably run by Google.

User-Agents.org has a list of popular user-agent identifiers. But remember that the contents of this header are set by the person running the bot/browser, and can contain anything she wants it to. So, for example, malicious users will typically copy the User-Agent string from a popular browser or perhaps a popular search engine. So you can't depend on this.

tylerl
  • 15,055
  • 7
  • 51
  • 72