In Ruby 1.9 I would use String#match(regexp,start_index)
. I'm sure there must be a (computationally efficient) equivalent in Ruby 1.8, but I can't find it. Do you know what it is?
Asked
Active
Viewed 123 times
1

mu is too short
- 426,620
- 70
- 833
- 800

Alex D
- 29,755
- 7
- 80
- 126
2 Answers
3
You could start the regexp with ^.{start_index}
or take the substring first before performing the match.
Alternatively, if you're constrained to using Ruby 1.8, but can install your own libraries then you could use Oniguruma.

mikej
- 65,295
- 17
- 152
- 131
-
Sure... but notice I asked for a "computationally efficient" alternative. – Alex D Aug 29 '12 at 21:19
-
1@AlexD: What makes you think `^.{n}` isn't computationally efficient? The regex engine probably just offsets into the string and starts working from there. A quick bit of benchmarking suggests that `^.{n}` is slightly faster than the obvious alternative (`s[i..-1].match(re)`). – mu is too short Aug 29 '12 at 22:04
-
I tried it in `irb` before answering, and it's inefficient. O(n) in the length of the string, and I will be using this to parse large files which may be tens or hundreds of megabytes. Based on the tests I have done, it could take as much as 30 seconds for a single match in such a case. – Alex D Aug 30 '12 at 05:21
-
@mikej, installing Oniguruma is an interesting idea (I didn't know it was possible), but unfortunately this is for a (somewhat popular) gem, and it's simply not acceptable to tell all Ruby 1.8 users that "you have to install Oniguruma to use this gem". – Alex D Aug 30 '12 at 05:34
0
As far as I can tell, there is no efficient way to match a Regexp against a large string, starting from an arbitrary index, in pure Ruby 1.8.
This seems like a major flaw. I guess the moral of the story is: use Ruby 1.9!

Alex D
- 29,755
- 7
- 80
- 126