1

In Ruby 1.9 I would use String#match(regexp,start_index). I'm sure there must be a (computationally efficient) equivalent in Ruby 1.8, but I can't find it. Do you know what it is?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Alex D
  • 29,755
  • 7
  • 80
  • 126

2 Answers2

3

You could start the regexp with ^.{start_index}

or take the substring first before performing the match.

Alternatively, if you're constrained to using Ruby 1.8, but can install your own libraries then you could use Oniguruma.

mikej
  • 65,295
  • 17
  • 152
  • 131
  • Sure... but notice I asked for a "computationally efficient" alternative. – Alex D Aug 29 '12 at 21:19
  • 1
    @AlexD: What makes you think `^.{n}` isn't computationally efficient? The regex engine probably just offsets into the string and starts working from there. A quick bit of benchmarking suggests that `^.{n}` is slightly faster than the obvious alternative (`s[i..-1].match(re)`). – mu is too short Aug 29 '12 at 22:04
  • I tried it in `irb` before answering, and it's inefficient. O(n) in the length of the string, and I will be using this to parse large files which may be tens or hundreds of megabytes. Based on the tests I have done, it could take as much as 30 seconds for a single match in such a case. – Alex D Aug 30 '12 at 05:21
  • @mikej, installing Oniguruma is an interesting idea (I didn't know it was possible), but unfortunately this is for a (somewhat popular) gem, and it's simply not acceptable to tell all Ruby 1.8 users that "you have to install Oniguruma to use this gem". – Alex D Aug 30 '12 at 05:34
0

As far as I can tell, there is no efficient way to match a Regexp against a large string, starting from an arbitrary index, in pure Ruby 1.8.

This seems like a major flaw. I guess the moral of the story is: use Ruby 1.9!

Alex D
  • 29,755
  • 7
  • 80
  • 126