3

Yesterday I setup some software which tracks all http requests across our network of websites. After analyzing the first day of traffic we found nearly a dozen IP's that were flat out harvesting our data. It's pretty obvious when one ip browses 300 pages in a matter of 1 hour lol. I did do a reverse lookup on these and the majority were from Singapore, China, etc so they weren't search engine bots.

Does anyone know a service or website that maintains a list of bad IP's that should be blocked?

Joe
  • 3,043
  • 9
  • 43
  • 56
  • 3
    You should consider limiting the number of requests for a certain amount of time instead. That way you don't run the possibility of penalizing people who are unlucky enough to be on a blacklist for whatever reason. – Radu Nov 04 '10 at 21:52
  • @Radu: What do you mean by limiting the number of requests for a certain amount of time? – Lorenzo Nov 04 '10 at 22:04
  • @Lorenzo It would depend on the platform but here's a nice question and answer for asp.net mvc: http://stackoverflow.com/questions/33969/best-way-to-implement-request-throttling-in-asp-net-mvc – Radu Nov 04 '10 at 22:07
  • he probably means if user browses say 100 pages in an hour then block them. While that works there is too big of a chance to block a legit search engine. – Joe Nov 04 '10 at 22:09
  • Yeah, that's exactly what I mean but the key is to choose a reasonable value which is obviously not that easy. If you want to avoid blacklisting an over-enthusiastic search engine then you'll need a method to verify if a user is a real searchengine. With googlebot you'll need to do a reverse DNS lookup as described here: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553 – Radu Nov 04 '10 at 22:21

1 Answers1

3

Yes there is a list of IPs which is dynamic. So there is no download for that list. But you can query it via DNS.

Have a look at the Http:BL of projecthoneypot.org: http://www.projecthoneypot.org/httpbl_api.php

mailq
  • 427
  • 2
  • 10