5

I run a large forum and like everyone else have issues with spammers/bots. There are huge lists of known spam IP's that you can download and use in htaccess form, but my only concern is the file size. So I suppose the question is how big is too big, given it's going to be loading in for every user. Adding all the IP's in it gets to about 100kb.

Is there an alternative that would have less overhead? Possibly doing it with php, or will that result in some heavy load too due to file size and checking ips etc?

Any advice would be greatly appreciated.

Thanks,

Steve

Steve
  • 51
  • 1
  • 2

8 Answers8

3

There are often more efficient ways than IP bans. For example, hidden fields in a form only bots will fill out, or requiring javascript or cookies for submitting forms.

For IP banning, I wouldn’t use .htaccess files. Depending on your webserver it may read the htaccess files for each request. I’d definitely add the IP-bans into your webservers vhost configuration instead. That way I’d be sure the webserver will keep it in RAM and not read it again and again.

Doing it via PHP would also be an option. This way, you could also easily limit the bans to forms, like registration in your forum.

Kissaki
  • 8,810
  • 5
  • 40
  • 42
  • Good point, IP blocking on just the registration form via PHP would probably make more sense. Could be a more sensible option. I use recaptcha on the registration form, but bots still get through on occasions. Kind of like the idea of requiring javascript on the form, but then some users surf without it turn on I guess? – Steve Feb 04 '11 at 10:22
  • With cookies or JS being a requirement you’ll have to add a check for them and display a note when they’re disabled, so users know what to do. The note won’t help the bots at all, so that’s good. :) When JS and cookies are only a requirement for forms the check will only have to be there as well. – Kissaki Feb 04 '11 at 10:26
2

There are a few options:

  • You can store the block list into the database. It's more effecient to query there than with a loop in PHP.
  • You could pre-process the list with array_map(ip2long()) to save memory and possibly lookup time.
  • You could package the IP list into a regular expression, maybe run it though an optimizer (Perl Regexp::Optimizer). PCRE testing would again be faster than a foreach and strpos tests. $regex = implode("|", array_map("preg_quote", file("ip.txt")));

But then, IP block lists are not often very reliable. Maybe you should implement the other two workarounds: hidden form fields to detect dumb bots. Or captchas to block non-humans (not very user-friendly, but solves the problem).

mario
  • 144,265
  • 20
  • 237
  • 291
  • Ah, good point about IP ban effectivity. I've had this problem some time ago, and went the "keep the IP, block the user" way: http://stackoverflow.com/questions/3513445/keeping-a-troll-out-ip-bans-considered-harmful-what-to-use-instead – Piskvor left the building Feb 04 '11 at 09:51
  • @Piskvor: I had the same problem. While some IP ban lists can mitigate the issue, it only works against undedicated spammers. – mario Feb 04 '11 at 09:55
0

In .htaccess in your DocumentRoot, after:

Order Deny,Allow

Append a line:

Deny from <black ip>
Lex Podgorny
  • 2,598
  • 1
  • 23
  • 40
0

Well, you are building a database of addresses, right? Wouldn't it be useful to use a database product for it? If you don't have any yet, SQLite could be up to the task.

Piskvor left the building
  • 91,498
  • 46
  • 177
  • 222
  • It’s a static list and only the webserver will read it, so probably no. DBS come to full power when you have changing data and esp. with multiple data accessors. For a static IP-list check this would be overhead, although one could argue if it really is too much overhead. (Although I just noticed when reading the other answer, querying for it may actually be more efficient in DBS, with indexed columns and all.) – Kissaki Feb 04 '11 at 09:51
  • @Kissaki: Ah, that depends on many variables (profiling is essential here). If the OP already has some sort of database, this would be easier. BTW, "DBs are not really useful for read-only data" is a common misconception from incorrectly inverting "files aren't good for concurrent read-write access" - using a DB can boost the read-only performance quite a bit (or kill it if it's done wrong, of course). Testing and measuring *in that specific environment* is the key - one size never fits all. – Piskvor left the building Feb 04 '11 at 09:56
0

maybe you want to stop spam the good-ole-fashioned-way - Captcha ?

I believe that a Mr. Albert Einstein once said: Problems cannot be solved at the same level of awareness that created them :)

Ryan Fernandes
  • 8,238
  • 7
  • 36
  • 53
  • Thanks for the response. However I already use recaptcha, but still get bots coming through. It does cut the amount down however. – Steve Feb 04 '11 at 10:19
0

Unless you already have problems with the load on your server you will probably not notice the difference from a 100K .htaccess file. There may be faster alternatives, perhaps including the use of iptables or the use of sorted ip lists that can be searched faster for matches, or even the use of a database (though the overhead of a single database query might crush the benefit of indexed tables) but it is probably not worth the effort unless you run a forum with high loads.

You can alternatively try to use captcha's or similar. Everything in this direction comes at an expense and nothing is 100% reliable.

yankee
  • 38,872
  • 15
  • 103
  • 162
  • A 10K .htaccess will be barely manageable; but if you have more than 1 visitor/minute, I would bet that 100K .htaccess will be __very__ noticeable - it just does not scale well when oversized (not to mention that it's not the right tool for the job, nevermind that you *can* do anything with a hammer). – Piskvor left the building Feb 04 '11 at 09:59
0

Don't use such IP lists. They're likely to get outdated and you might block the wrong requests. Just invest in good or better captchas and only block IPs from time to time if they're really doing some kind of denial of service attack.

initall
  • 2,385
  • 19
  • 27
0

Why force the webserver to handle blocking users? I'd suggest using null routes (as using iptables will slow your server down if the amount of blocked IP entries grows).

Read up on http://www.cyberciti.biz/tips/how-do-i-drop-or-block-attackers-ip-with-null-routes.html

http://php.net/manual/en/function.shell-exec.php

Peeter
  • 9,282
  • 5
  • 36
  • 53