2

I just read about Regexp.match?('string') for Ruby 2.4 and was very excited to see the results! But when I tried it out in my application, I hardly saw any gains.

str = 's'
Benchmark.bm do |b|
  b.report(".match         ") { 100000.times { 'string'.match /s/ } }
  b.report(".match?        ") { 100000.times { 'string'.match? /s/ } }
  b.report(".match dynamic ") { 100000.times { 'string'.match /#{str}/ } }
  b.report(".match? dynamic") { 100000.times { 'string'.match? /#{str}/ } }
end
                 user     system      total        real
.match           0.140000   0.000000   0.140000 (  0.143658)
.match?          0.020000   0.000000   0.020000 (  0.029628)
.match dynamic   0.370000   0.010000   0.380000 (  0.371935)
.match? dynamic  0.260000   0.010000   0.270000 (  0.278614)

From the Benchmark, we see a tremendous gain from .match to .match?, but once i start dynamically creating complicated regex as my app requires, I'm loosing a lot of the gains.

My question is, why is there such a drastic difference and can I somehow create dynamic regexp to utilize the performance of .matches? in the example below? I tested my benchmarks using ruby 2.4.2p198

str = 'my text with words'
reg_str = '((^|[\s\"“])(cherry pie|cherry pies)($|[\s\"”\.\,\:\?\!])|(\#(cherrypie|cherrypies)($|\s|\#|\.|\,|\:|\?|\!)))'
puts Benchmark.measure {
  100000.times { str.match? /#{reg_str}/i }
}
9.380000   0.010000   9.390000 (  9.403821)

puts Benchmark.measure {
  100000.times { str.match? /((^|[\s\"“])(cherry pie|cherry pies)($|[\s\"”\.\,\:\?\!])|(\#(cherrypie|cherrypies)($|\s|\#|\.|\,|\:|\?|\!)))/i }
}  
0.020000   0.000000   0.020000 (  0.017900)
ayeezy
  • 65
  • 1
  • 8

3 Answers3

4

Use the /o modifier, so the interpolation is only performed once:

str = 's'
Benchmark.bm do |b|
  b.report(".match         ") { 100000.times { 'string'.match /s/ } }
  b.report(".match?        ") { 100000.times { 'string'.match? /s/ } }
  b.report(".match dynamic ") { 100000.times { 'string'.match /#{str}/o } }
  b.report(".match? dynamic") { 100000.times { 'string'.match? /#{str}/o } }
end
       user     system      total        real
.match           0.120000   0.010000   0.130000 (  0.117889)
.match?          0.020000   0.000000   0.020000 (  0.027255)
.match dynamic   0.110000   0.000000   0.110000 (  0.113300)
.match? dynamic  0.030000   0.000000   0.030000 (  0.034755)
dug
  • 2,275
  • 1
  • 18
  • 25
  • 1
    FYI: in the real life nobody matches anything 100_000 times. That said, your advice is good for marketing department: tests will show the drastic performance increase, while in reality this change brings zero profit. In other words, **this answer is plain wrong.** – Aleksei Matiushkin May 22 '18 at 19:09
  • This looks promising to me. My use case is to scan for the same 10,000 regex like millions of times. What's the trade off on using the /o/ modifier? Is it just taking up a lot of memory? – ayeezy May 23 '18 at 03:40
  • 1
    @mudasobwa i have matched a huge number of times in real life (not tinder). For example when processing logs, request headers, etc. – Kimmo Lehto May 23 '18 at 08:51
3

You basically measure the string/regexp interpolation vs literal instantiation. The time of match? itself is not affecting the result of the measure at all.

To compare match? against match, one should instantiate the regexp upfront:

str = 'my text with words'
reg_str = '...'
reg = /#{reg_str}/i
puts Benchmark.measure {
  100000.times { str.match? reg }
}

The result of above will be roughly the same as in your second test.

That said, the string/regexp interpolation is the beast who takes most of the time. If you need a complicated interpolation in regular expression, the difference between match? and match won’t be noticeable, since the interpolation is a bottleneck, not the matching.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
  • In my 2nd benchmark, hardcoding in a regex vs interpolating is causing like a 500x difference in performance. I am still hoping there would is a way to generate regex from string that would be quicker. – ayeezy May 23 '18 at 03:38
2

The speed improvement of match? comes from not allocating the MatchData objects and globals like $1. It just returns true or false. You can't use match? if you need to return something from the regex.

match? won't be any faster at compiling regex-strings into Regexp objects.

Perhaps in your code you can first create the regexes and then use those in the loop instead of constantly recreating them:

# bad:
lines.each { |line| puts "Found a match!" if line.match?(/abcd/) }

# good:
regex = /abcd/
lines.each { |line| puts "Found a match!" if line.match?(regex) }
Kimmo Lehto
  • 5,910
  • 1
  • 23
  • 32