7

I wonder if this:

object Foo {
  val regex = "some complex regex".r
  def foo() {
    // use regex
  }
}

and this:

object Foo {
  def foo() {
    val regex = "some complex regex".r
    // use regex
  }
}

will have any performance difference. i.e., will scala compiler recognize that "some complex regex".r is a constant and cache it, so that it will not recompile every time?

lyomi
  • 4,230
  • 6
  • 30
  • 39
  • You may want to give an example with a method other than the main method - `"some complex regex".r` will get executed only once in both examples, regardless of any compiler optimization, since the main method will only be called once (to launch the program). Unless you call the main method from within your program of course, but that's not what people reading the example would expect. – Cyäegha Sep 26 '14 at 08:53
  • @Cyäegha thanks for noting that, corrected. – lyomi Sep 26 '14 at 08:57
  • 2
    In order to be able to do this, the compiler would have to prove that [`StringLike.r`](http://Scala-Lang.Org/api/current/index.html#scala.collection.immutable.StringLike@r:scala.util.matching.Regex) is pure, which is (in the general case) equivalent to solving the Halting Problem. For simple methods, it might still be possible, but I don't know whether the compiler even attempts that, considering that it's very likely it won't be able to prove anything anyway. – Jörg W Mittag Sep 26 '14 at 09:10

1 Answers1

8

It will have a difference in runtime. Expression from first example will be calculated only once. Expression from second - every time you call Foo.foo(). Calculation here means applying implicitly added function "r" (from scala-library) to the string:

scala> ".*".r
res40: scala.util.matching.Regex = .*

This function actually compiles the regular expression every time you call it (no caching).

Btw, any naive caching of regexps in runtime is vulnerable to OutOfMemory - however, I believe it's possible to implement it safely with WeakHashMap, but current Java's Pattern implementation (which is underlying to scala's Regex) doesn't implement it actually, probably because such implementation may not have predictable effect on performance (GC may have to remove most of cached values every time it's running). Cache with eviction is more predictable, but still not so easy way (who's gonna choose timeout/size for it?). Talking about scala-way, some smart macro could do optimization in compile-time (do caching only for 'string constant'-based regexps), but by default:

Scala compiler also doesn't have any optimizations about regexps because regexp is not a part of the scala language.

So it's better to move static "".r constructions out of the function.

Community
  • 1
  • 1
dk14
  • 22,206
  • 4
  • 51
  • 88
  • Whether caching happens or not is not dependent on whether regular expressions are part of the language. For example, the implementation of `r` could cache compiled patterns keyed by the uncompiled pattern string. The real reason why Scala regular expressions are not cached is that the underlying java.util.regex package does not cache as answered [here](http://stackoverflow.com/questions/13420321/does-pattern-compile-cache) – Sim Aug 03 '15 at 01:19
  • @Sim You can clearly see that this question is about **compiler** from its title – dk14 Aug 03 '15 at 02:06
  • I saw that and I also saw from the way the question is worded that the objective was understanding performance differences as opposed to compiler capabilities. I know of no programming language where there is a language-level regex syntax, e.g., `/[a-c]/` in Ruby or JavaScript, which a compiler can easily detect, but there isn't a way to create a regex by passing a string to some constructor, which is not easy for a compiler to detect. That's why caching is typically implemented at the library level and not at the compiler level. Ruby works that way and, I believe, so does JavaScript. – Sim Aug 03 '15 at 03:20
  • @Sim I see explicit question about compiler and scala here, I can't read minds and would prefer to not do :). And I explicitly said that `r` compiles regexp every time you call it (which obviously means "no caching" told as simple as it possible) - I didn't said that it's somehow related to compile-time/runtime-problem. Information about imposibility optimizations in compile-time - is a last (and additional) sentence in my answer, intended to clear that there is *also* no optimization in scala-compile-time about regexps. – dk14 Aug 03 '15 at 04:13
  • @Sim I never said that "caching is dependent on whether regular expressions are part of the language" - you're just trying to read my mind and it doesn't work :). I said that scala compiler (explicitly mentioned in the question, twice) doesn't give a s*** about regexps – dk14 Aug 03 '15 at 04:19
  • @Sim actually "**hoisting of regular expressions** (as a compiler feature) is dependent on whether regular expressions are part of the language" – dk14 Aug 03 '15 at 04:59