Background
On every large, commercial Java project I've worked on, I come across numerous usages of Pattern.compile(...)
even in code segments which are re-used many times, e.g.
public String rewriteUrlWhichIsDoneABajillionTimes(final String requestedUrl) {
Matcher m = Pattern.compile("^/([^/]+)\\.html$").matcher(requestedUrl);
if (!m.matches()) {
return null;
}
// Do processing here
...
}
For every project on which I found things like this, I told at least one person whom I was working with that Pattern.compile(...)
is very slow and is not cached but that the java.util.regex.Pattern class is thread-safe and so it can be safely re-used, and each time they informed me that they did not know these things.
Potential solutions
Correct future usage of the API
One "solution" could be to (try to) force people to read the Java standard library documentation and to use the standard library "correctly", but prescriptive methods often to not work so well.
Correct past usage of the API
Alternatively (or complementarily), it would be possible to "clean up" any bad usages of Pattern.compile(...)
wherever they are found, but this is likely to be a never-ending task, since (according to my experience) people will continue to use Pattern.compile(...)
incorrectly over and over again...
Correct the API
So why not then simply change the Pattern.compile(...)
method so that it pools objects and returns the same instance for equivalent input?-- this would instantaneously apply a fix to possibly billions of lines of code around the world (as long as the respective code is run using a JRE which includes the change). The only possible downside I can imagine is that the software would have a larger memory footprint... but given how much memory most computers have these days, I doubt that this will cause problems anywhere other than in edge cases. On the other hand, a huge number of programs will likely run much faster. So why didn't/doesn't Oracle implement an object pool for Pattern
similarly to how they did for strings or for primitives?