3

I am building a regexp for AWS WAF using a negative lookahead.

joe(?!(ann|jen))

However, I've got back the following error from WAF console

WAFInvalidParameterException: Error reason: The parameter contains formatting that is not valid., field: REGEX_PATTERN_SET, parameter: joe(?!(ann|jen))

It seems like the AWS WAF does not support this kind of regexp. I've found this blog https://aws.amazon.com/about-aws/whats-new/2017/10/aws-waf-now-supports-regular-expressions-regex/

Is there anyone having similar issue? can you share how to fix it?

channa ly
  • 9,479
  • 14
  • 53
  • 86
  • 1
    `joe?!(ann|jen)` has no lookahead. `e` is made is optional with `?` quantifier. Do you have `joe(?!ann|jen)`? – Wiktor Stribiżew Jan 17 '20 at 08:34
  • yes. I have this joe(?!ann|jen). Thank you for correcting – channa ly Jan 17 '20 at 08:36
  • 2
    The documentation is [very unhelpful](https://docs.aws.amazon.com/waf/latest/developerguide/waf-regex-pattern-set-creating.html). They say the engine is PCRE, but it seems only POSIX functions are enabled as *arbitrary zero-width assertions* and basically all cool feature are not supported. – Wiktor Stribiżew Jan 17 '20 at 08:43
  • Are you sure you need a regex with a negative lookahead? What about something like `where col LIKE '%joe%' and col NOT LIKE '%joeann%' and col NOT LIKE '%joejen%'`? – Wiktor Stribiżew Jan 17 '20 at 08:46
  • @WiktorStribiżew WAF doesn't use SQL syntax anywhere, and they don't let you combine negative and positive rules. You can match all or none of the rules. – musicin3d Aug 26 '20 at 19:46

1 Answers1

2

Since negative lookaheads are unsupported, I broke mine out into several expressions that cover all cases. WAF lets you specify multiple expressions. It uses logical OR matching, so only one of them has to match. Using the example in the question, the solution could be...

joe[^aj]
joea[^n]
joean[^n]
joej[^e]
joeje[^n]

joe matches, unless he's followed by an a or a j. Then he's suspicious, so we go on to the next rule. If that a is followed by an n, the we're still suspicious, so we go on to the next rule. We repeat that process until we've decided whether or not the entire word is joeann or joejen


My particular use case was URI matching. I wanted to throttle requests to an entire directory, except for one subdirectory (and all its subdirectories).

Say we want to throttle /my/dir but not anything in /my/dir/safe. We would do it like so...

^/my/dir/?$
^/my/dir/[^s]
^/my/dir/s[^a]
^/my/dir/sa[^f]
^/my/dir/saf[^e]
^/my/dir/safe[^/]

We follow the same process of identifying each letter in sequence.

"You can't start with S. Ok, you can start with S, but you can't also have an A. Ok ok, I'll let it slide, but you cannot have an F too. Ok fine, your persistent, but..."

Notice we have to include a rule for the trailing slash /. This covers the optional slash in /my/dir/safe/ and all subdirectories such as /my/dir/safe/whatever.

musicin3d
  • 1,028
  • 1
  • 12
  • 22