I am working on a site, which is going to allow downloads to users, there will be around 2,000,000 files which can be downloaded.
We want to discourage people from crawling and taking all of these documents so would like to limit the number of requests we server containing a URL pattern over a certain time limit. We are happy for the rest of the site to be crawled so don't want to limit that.
We are putting an exclusion in robots.txt to discourage crawlers from getting the files. we are more worried about malicious or misbehaving crawlers.
We would like to use apache to limit the number of downloads of the documents to about 1 per minute per ip address.
Is there a best practice way to do this?
we are using Centos with apache2.2
There are a lot of similar questions to this but most of them seem to center on bandwidth limiting which is not what I want.