I am using AWS WAF across multiple CloudFront distributions which go to different URLs. Generally speaking, it is working well. However, we have noticed particular activity on a few of the underlying sites that I want to block, but I don't want to block it across all the sites.
It seemed simple enough to me to create a WAF rule that would match a regex on the URI and block based on that. However, it appears that AWS WAF does not use the host in its URI matching. For example this rule:
Inspect URI, Block based on RegEx with RegEx being:
^(http|https):\/\/(www)?\.?example\.net\/(.*)?\/*.html$
And these test URLs work in my regex tester:
http://example.net/blah.html
https://example.net/blah.html
http://www.example.net/blah.html
https://example.net/stuff/blah.html
When I apply it to the WAF, though, it does not block.
Is there something else I can do here to achieve what I am looking to do? I do not want to edit anything directly on my hosting servers because it would be more of a maintenance headache and it would not solve the problem I am attempting to solve (which is stop bots from spamming bad URLs and spiking my server with 404s).
I also realize someone may suggest I could do a rate limit - which I do have in place - but the bots are coming from many different IPs so that doesn't solve this particular case. Instead, I just want to block some of the URL types that they keep trying to get to. In this case, it's thousands and thousands of HTML pages. It also does not take into account that I only want to block these requests for a very specific site.