4

I have both Apache and Modsecurity working together. I'm trying to limit hit rate by request's header (like "facebookexternalhit"). And then return a friendly "429 Too Many Requests" and "Retry-After: 3".

I know I can read a file of headers like:

SecRule REQUEST_HEADERS:User-Agent "@pmFromFile ratelimit-bots.txt"

But I'm getting trouble building the rule.

Any help would be really appreciated. Thank you.

Luciano Fantuzzi
  • 916
  • 12
  • 23
  • Please specify what you have tried by now. Did you manage to rate limit by IP? Check this https://gist.github.com/josnidhin/91d1ea9cd71fde386c27a9228476834e – yeya Dec 11 '18 at 16:26
  • Yes, and I've just discovered how to do it. You can see my answer below. – Luciano Fantuzzi Dec 12 '18 at 05:15

2 Answers2

5

After 2 days of researching and understanding how Modsecurity works, I finally did it. FYI I'm using Apache 2.4.37 and Modsecurity 2.9.2 This is what I did:

In my custom file rules: /etc/modsecurity/modsecurity_custom.conf I've added the following rule:

# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit" \
    "id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebookexternalhit=+1,expirevar:global.ratelimit_facebookexternalhit=3"
SecRule GLOBAL:RATELIMIT_FACEBOOKEXTERNALHIT "@gt 1" \
    "chain,id:4000010,phase:2,pause:300,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
    SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"

Explanation:

Note: I want to limit to 1 request every 3 seconds.

  1. The first rule matches the request header user agent against "facebookexternalhit". If the match was succesful, it creates the ratelimit_facebookexternalhit property in the global collection with the initial value of 1 (it will increment this value with every hit matching the user agent). Then, it sets the expiration time of this var in 3 seconds. If we receive a new hit matching "facebookexternalhit" it will sum 1 to ratelimit_facebookexternalhit. If we don't receive hits matching "facebookexternalhit" after 3 seconds, ratelimit_facebookexternalhit will be gone and this process will be restarted.
  2. If global.ratelimit_clients > 1 (we received 2 or more hits within 3 seconds) AND user agent matches "facebookexternalhit" (this AND condition is important because otherwise all requests will be denied if a match is produced), we set RATELIMITED=1, stop the action with a 429 http error, and log a custom message in Apache error log: "RATELIMITED BOT".
  3. RATELIMITED=1 is set just to add the custom header "Retry-After: 3". In this case, this var is interpreted by Facebook's crawler (facebookexternalhit) and will retry operation in the specified time.
  4. We map a custom return message (in case we want) for the 429 error.

You could improve this rule by adding @pmf and a .data file, then initializing global collection like initcol:global=%{MATCHED_VAR}, so you are not limited just to a single match by rule. I didn't test this last step (this is what I needed right now). I'll update my answer in case I do.

UPDATE:

I've adapted the rule to be able to have a file with all user agents I want to rate limit, so a single rule can be used across multiple bots/crawlers:

# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data" \
    "id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,expirevar:user.ratelimit_client=3"

SecRule USER:RATELIMIT_CLIENT "@gt 1" \
    "chain,id:1000009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"                                                                                     
    SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data"

Header always set Retry-After "3" env=RATELIMITED

ErrorDocument 429 "Too Many Requests"

So, the file with user agents (one per line) is located inside a subdirectory under the same directory of this rule: /etc/modsecurity/data/ratelimit-clients.data. Then we use @pmf to read and parse the file (https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)#pmfromfile). We initialize the USER collection with the user agent: setuid:%{tx.ua_hash} (tx.ua_hash is in the global scope in /usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf). And we simply use user as collection instead of global. That's all!

Luciano Fantuzzi
  • 916
  • 12
  • 23
  • I tested this rule (the first one not the update) with curl by faking user agent. If you send requests sequentially it blocks with 429 after first one as expected. But if you chain them with backgrounding operator like this: curl -A "facebookexternalhit" example.com & curl -A "facebookexternalhit" example.com & .... then all requests return 200 – HomeIsWhereThePcIs Jun 19 '19 at 09:54
  • This happens because each process has an independent collection and they only get synced periodically. So if you send a lot of requests at the same time they will all have the initial ratelimit_client variable of 0, untill it gets synced. If anyone knows a way around that please share. – HomeIsWhereThePcIs Jun 19 '19 at 11:13
  • Nice catch. You may try asking in their support forum here https://sourceforge.net/projects/mod-security/lists/mod-security-users I'm subscribed too, so if you get an answer it would be interesting to read. – Luciano Fantuzzi Jun 19 '19 at 21:36
3

Might be better to use "deprecatevar", And you can allow a bit bigger burst leneanancy

# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data" \
        "id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,deprecatevar:user.ratelimit_client=3/1"
SecRule USER:RATELIMIT_CLIENT "@gt 1" \
        "chain,id:100009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"                                                                 
            SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data"
    
Header always set Retry-After "6" env=RATELIMITED
    
ErrorDocument 429 "Too Many Requests"
Nagri
  • 3,008
  • 5
  • 34
  • 63
Sasf54
  • 79
  • 4
  • Thanks! How is this different from expiring method in terms of request bursts? – Luciano Fantuzzi May 06 '22 at 03:58
  • It ticks down. 3 points per sec. – Sasf54 May 09 '22 at 08:14
  • There is an issue with this solution. Say you want to ratelimit User-Agent1 and User-Agent2. If User-Agent1 exhaust the ratelimit, User-Agent2 will be ratelimited as well, though User-Agent2 never made any requests. How to make this solution User-Agent aware? – Nagri Sep 05 '22 at 09:28
  • You will need to put the ratelimit counter into a subset of: hash( ip+useragent) and check, if it's grater than expected. You can use the IP collection (ip.ushash.ratelimit_client) (but you have to define uahash first, under IP) warning: user-agent field CAN be very long and can contain exploit code, so hash it – Sasf54 Oct 10 '22 at 13:10