0

As many who posted here, my apache server is getting brutally hammerred to near death by robots, most of which are good robots.

There is no way to change their crawl rate.

Anyone have a suggestion as to how to solve this problem? I thought perhaps of creating 2 groups, called users and bot and allocating 50% for each.

Then from the behaviour or user-agent, if something is identified as a bot, it will be set to the bots group whose members have a maximum of 50% total resources. If bots try to take more, the system will slow down proportiannly so they never have more than 50%.

Does anyone know how to go about doing this or some other method with the same goal? I am using centos and have very little experience in this.

Ray S.
  • 101
  • 3
  • 1
    You mentioned yourself that bots are good, so doing what you imply is only going to bottleneck the performance of the bot and potentially result in it not completing a request. I would personally work on improving the performance of your system as a whole rather than trying to do that. – Peter Dec 03 '14 at 06:40
  • @Peter we tried upgrading the hardware but the bots just come in stronger. we would like the server to slow down only for the bots group if they exceed their 50% resources allocated – Ray S. Dec 03 '14 at 06:42
  • That doesn't sound like something legit bots do. Just block their IPs. If you don't want to do that, I stand by my previous comment. – Peter Dec 03 '14 at 06:45
  • @Peter ok, but like i said, they take as much as they can get, so improve the system will not help. we have a site with many tens of thousands of pages with pictures – Ray S. Dec 03 '14 at 06:54
  • Actual numbers might be of help here. If you have a $2/month VPS and robots are chewing up 50% of your CPU, that's entirely different from having robots chew through half the CPU on a dedicated server with 24 cores. Likewise, a metric on the bots like connections/requests per second would go a long way to helping determine if this is normal bot traffic your machine *should* be able to handle, or if there's something amiss. – HopelessN00b Dec 03 '14 at 06:59
  • @HopelessN00b thanks. but is there not a an apache module or something which can accomplish this? we though it is worth giving it a shot. – Ray S. Dec 03 '14 at 07:01
  • we have a good size dedicated server. not sure off hand the details. but it is a powerful machine with plenty of ram. – Ray S. Dec 03 '14 at 07:18
  • http://serverfault.com/questions/540743/how-to-rate-limit-apache-server-on-ip-basis Have fun. Just know it's a poor infrastructure solution. – Peter Dec 03 '14 at 09:58
  • @Peter thanks but mod_cband is great butit is known to be buggy. mod_evasive did not even work, at least for me. – Ray S. Dec 03 '14 at 11:48
  • Which "good" bots are these? – Michael Hampton Dec 03 '14 at 12:48
  • @MichaelHampton not sure. was told by colleagues they are bots we want. – Ray S. Dec 03 '14 at 18:18
  • They didn't mention any bots in particular?! – Michael Hampton Dec 03 '14 at 18:19
  • @MichaelHampton when i find out will let you know. – Ray S. Dec 03 '14 at 18:21

0 Answers0