10

I am writing some Perl scripts where I need to do a lot of string matching. For example:

my $str1 = "this is a test string";
my $str2 = "test";

To see if $str1 contains $str2 - I found that there are 2 approaches:

Approach 1: use Index function:

if ( index($str1, $str2) != -1 ) { .... }

Approach 2: use regular expression:

if( $str1 =~ /$str2/ ) { .... }

Which is better? and when should we use each of these over the other?

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
Kedar Joshi
  • 1,182
  • 1
  • 20
  • 27
  • For simplicity, let's assume that strings do not contain regexp meta-characters (as with my use case). What would be the answers in that case? – Kedar Joshi Jun 09 '15 at 23:12
  • Often you're not looking for a match just *anywhere* in your string. In your example, you probably want `this is a test string` to match but you may not want to match `I am a Protestant`. In fact, looking for a match just anywhere (whether with `index` or an unanchored regex) is a common logic bug. – ThisSuitIsBlackNot Jun 09 '15 at 23:24
  • I think that regular expressions try to optimize the location of the first letter in your expression. It might be faster, better benchmark both ways. And I think regex allows you to continue to find the next match starting from where the previous match left off. But, it could be index allows you to set a start position (can't remember). –  Jun 09 '15 at 23:31
  • The equivalent of `index($str1, $str2) != -1` is actually `$str1 =~ /\Q$str2/` – ikegami Jun 10 '15 at 00:50

3 Answers3

8

Here is the result of Benchmark:

use Benchmark qw(:all) ;
my $count = -1;
my $str1 = "this is a test string";
my $str2 = "test";
my $str3 = qr/test/;

cmpthese($count, {
    'type1' => sub { if ( index($str1, $str2) != -1 ) { 1 } },
    'type2' => sub { if( $str1 =~ $str3 ) { 1 } },
});

Result (when a match happens):

           Rate type2 type1
type2 1747627/s    --  -70%
type1 5770465/s  230%    --

To be able to draw a conclusion, test not to match:

my $str2 = "text";
my $str3 = qr/text/;

Result (when a match does not happen):

           Rate type2 type1
type2 1857295/s    --  -67%
type1 5560630/s  199%    --

Conclusion:

The index function is much faster than the regexp match.

OmarOthman
  • 1,718
  • 2
  • 19
  • 36
Toto
  • 89,455
  • 62
  • 89
  • 125
  • Beyond the benchmark, I would want understand why that is so much more faster for simple match like this. OK, there is compile time but it is normally done only once for such case. I would known difference in term of algorythm. – Laurent DECLERCQ a.k.a Nuxwin Oct 14 '18 at 18:15
  • 1
    @Nuxwin: I am not able to give you such info but you will have useful information when running `B::Debug` or `B::Concise` modules. See: https://perldoc.perl.org/index-modules-B.html – Toto Oct 15 '18 at 08:41
2

When I see code that uses index, I usually see an index within an index within an index, etc. There's also more branching too: "if found, look for this; otherwise since not found, look for that." Almost always a single regex would have worked. So, for me, I almost always use a regex unless there's some specific reason I want to use an index.

Unfortunately, most programmers I run into don't read regex well and so for maintainability, the index method should be used more than I do.

kjpires
  • 730
  • 4
  • 14
  • 1
    The above comment has so much more charm when reading its author's nickname as @‌sin rather than @sln. Thanks, sans-serif fonts! – Adam Katz Jul 18 '18 at 18:13
0

If you need a substring match, use index. If you need a regexp match (with special meaning for regexp metacharacters), use =~. A substring match is usually faster, but regexps in Perl are quite well optimized, and simple regexp matches can be surprisingly fast. Benchmark it for yourself.

Since Perl 5.6, Perl is smart enough to recompile the regexp in $str =~ /$str2/ iff $str2 has changed since the last compilation. To fully control when your regexp is compiled, use qr/$str2/. See Does the 'o' modifier for Perl regular expressions still provide any benefit? for q/.../o (obsolete) and qr/.../ (not needed most of the time, but can be useful).

pts
  • 80,836
  • 20
  • 110
  • 183
  • 7
    If `$str2` doesn't change, the pattern won't be recompiled, so `/o` is useless. If `$str2` does change, then pattern needs to be recompiled, so `/o` is wrong. `/o` is at best useless, so no, don't use `/o`! – ikegami Jun 10 '15 at 00:49
  • 3
    I want to emphasize that what @ikegami [said](http://stackoverflow.com/questions/30744379/perl-string-index-function-or-regex-which-is-better-and-when#comment49545977_30744417) applies, not just in this case, but in virtually all cases. See [this question](http://stackoverflow.com/q/550258/20938) for more info. – Alan Moore Jun 10 '15 at 01:13
  • Updated my answer about `/o`. – pts May 19 '20 at 17:15