19

I tried to recreate regular expression denial of service attack using (a+)+ regexp and aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! (with large amounts of a) input using jshell:

Pattern.compile("(a+)+")
    .matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")
    .matches()

But this completes pretty quickly each time I tried. Is the regexp implementation in Java different from others? Or the linked wikipedia page is wrong?

(BTW. I'm using Java 11, if that's relevant)

EDIT: Looks like it is Java version related, when I tried it on Java 8, it hangs, but in Java 9 and 11 it works right away. What did change between those versions that could affect that? Are all regex safe now in Java?

Is there a specific Java JEP that changed the regexp implementation? I would like to know what kind of regexps are still a problem for newer Java.

Krzysztof Krasoń
  • 26,515
  • 16
  • 89
  • 115
  • If you have not seen it already: http://blog.mgm-tp.com/2012/06/regexp-java-puzzler-2/ – PM 77-1 Oct 29 '18 at 15:38
  • @PM77-1 I tried the code from that page and it works fast also, and prints the results as described there. – Krzysztof Krasoń Oct 29 '18 at 15:42
  • Your pattern is fine with most of the regex engines. The only problem with this kind of patterns is when it is followed with some other patterns. – Wiktor Stribiżew Oct 29 '18 at 16:43
  • 14
    @WiktorStribiżew not really. Java’s engine might have become a bit better, but just changing it to `((a+)+)+` makes it hang again. And it can be proven that each additional nesting, i.e. `(((a+)+)+)+`, `((((a+)+)+)+)+` raises the complexity, so Java’s engine has not learned to cope with this pattern, it just became a bit better at nested iterations. – Holger Oct 29 '18 at 18:17
  • @Holger Try `(a)(\1*)+x`. I doubt that many engines has protection against repeating patterns from backreferences. Though it is not a common pattern. – inf3rno Apr 28 '21 at 15:30
  • The following will hang in Java 17: `Pattern.compile("(.*a){10000}").matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!").matches();` – xonya Mar 19 '23 at 12:46
  • FYI, more recent versions of Java seem to be immune to the patterns suggested by Holger . The patterns suggested by inf3rno and xonya still cause it to hang tho. (I needed an example of this for a unit test, and the test was passing in one version of Java but failing on the next lol. @inf3rno's example got my test working again tho, thanks!) – user435779 Aug 21 '23 at 16:22

2 Answers2

2

I am currently running Java 8 and the following code hangs:

Pattern.compile("(a|aa)+")
       .matcher("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab")
       .matches()

Seeing as how you are using Java 11 (and have also tested it with Java 9/10) and have seen it take a small amount of time to complete, there was obviously a change that was made between those versions.

From looking at the source code of Matcher in Java 11, we find the following addition that isn't present in Java 8:

/**
 * Storage used by top greedy Loop node to store a specific hash set to
 * keep the beginning index of the failed repetition match. The nodes
 * themselves are stateless, so they rely on this field to hold state
 * during a match.
 */
IntHashSet[] localsPos;

This local storage, along with a large amount of other code added, seems to be one of the main reasons why the state machine for regular expressions in Java 9+ completes much faster than in Java 8 and below.

Jacob G.
  • 28,856
  • 5
  • 62
  • 116
2

According to the article RSPEC-2631, the ReDoS issue has been handled in Java 9 and later:

Java runtimes like OpenJDK 9+ are mitigating this problem by having additional protections in their implementation of regular expression evaluation. In those runtime the example above is not vulnerable.

Happy
  • 757
  • 9
  • 18
  • 1
    Java 9+ is still vulnerable to ReDoS, just needs trickier patterns than `(a|aa)+`. See comments elsewhere; this is for those who just look at the accepted answer. This problem certainly will never go away. RE2, a regular expression engine, is by design immune to these exponential runtime problems, but they had to drop certain regular expression features for this (compared to Java/Perl regular expressions). – ddekany Aug 31 '22 at 21:43