1

I'm creating a site with a large number of small tools. One is a tool which uses a Python cgi-script that interacts with the Natural Language Toolkit (nltk), which is disk and CPU intensive. I have other services that are less intensive.

On the client side, these are rate limited with JavaScript throttling. A malicious or dumb user creating a bot could make connections directly. I would like to prevent this at as low of a level as possible, but I need it to be per-directory (the cgi scripts sit next to the index.html files).

I will also have a firewall that blocks abusive IPs making eminence requests. I would like this to return a simple 429 response with an empty JSON object as the body.

I would also like to include other rules in this, like a 30 second receive-to-transmit time, else close the connection.

How can I implement this in a lighttpd server on Debian?

j0k
  • 411
  • 9
  • 16
Brigand
  • 135
  • 5

1 Answers1

2

To directly answer your question, take a look at ModEvasive and Traffic Shaping from Lighttpd.

As an alternative/in addition to what you mentioned, you should be able to throttle connections per second using iptables rules.

And finally, consider using some sort of application (message) queue mechanism. That way no matter how many requests you get to your web interface/api endpoint, queue will ensure that only the desired number of parallel nltk tasks will run at the same time.

Mxx
  • 2,362
  • 2
  • 28
  • 40
  • Thanks. ModEvasive looks good, and I've enabled it. I can't see a difference between it and Traffic Shaping. – Brigand Aug 11 '13 at 07:04
  • Do you know of any bidirectional queues? I need to get a response, so that I can pass it back to the client. I know I could have each write to a queue, but each must go back to the correct process. I've looked at several, and don't see any mechanism for this. Am I missing something? – Brigand Aug 11 '13 at 07:11
  • @FakeRainBrigand, In simplest terms you would have 2 queues: `request queue` and `reply queue`. Clients will submit to `request queue` and include something like "reply to" field to be used in `reply queue`. Workers will publish results to `reply queue` addressed to specific "reply to" which your client will poll for. Just google for 'python message queue' and you'll find plenty of useful results. :) – Mxx Aug 11 '13 at 07:38
  • I didn’t quite understand how this answer addresses the question until the last paragraph. The question asks how to rate limit a particular endpoint. mod_evasive rate limits per IP address. Traffic shaping does not rate limit, but limit bandwidth. Iptables limits connections, not HTTP endpoints. Using a queue is a solution to the problem, but not as simple as configuring lighttpd. –  Nov 12 '20 at 16:18