42

It used to be considered beneficial to include the 'o' modifier at the end of Perl regular expressions. The current Perl documentation does not even seem to list it, certainly not at the modifiers section of perlre.

Does it provide any benefit now?

It is still accepted, for reasons of backwards compatibility if nothing else.


As noted by J A Faucett and brian d foy, the 'o' modifier is still documented, if you find the right places to look (one of which is not the perlre documentation). It is mentioned in the perlop pages. It is also found in the perlreref pages.

As noted by Alan M in the accepted answer, the better modern technique is usually to use the qr// (quoted regex) operator.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • The page you mention does describe the /o option, but only in the descriptions of qr// and m// operators. – J. A. Faucett Feb 15 '09 at 05:10
  • 1
    @J A Faucett: Hmm, I don't see it on that page, but can find it mentioned in the perlop (http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators) pages. It wasn't listed in the obvious place. I also found it listed in the perlreref (http://perldoc.perl.org/perlreref.html). – Jonathan Leffler Feb 15 '09 at 05:29

7 Answers7

38

/o is deprecated. The simplest way to make sure a regex is compiled only once is to use use a regex object, like so:

my $reg = qr/foo$bar/;

The interpolation of $bar is done when the variable $reg is initialized, and the cached, compiled regex will be used from then on within the enclosing scope. But sometimes you want the regex to be recompiled, because you want it to use the variable's new value. Here's the example Friedl used in The Book:

sub CheckLogfileForToday()
{
  my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];

  my $today_regex = qr/^$today:/i; # compiles once per function call

  while (<LOGFILE>) {
    if ($_ =~ $today_regex) {
      ...
    }
  }
}

Within the scope of the function, the value of $today_regex stays the same. But the next time the function is called, the regex will be recompiled with the new value of $today. If he had just used:

if ($_ =~ m/^$today:/io)

...the regex would never be updated. So, with the object form you have the efficiency of /o without sacrificing flexibility.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
19

The /o modifier is in the perlop documentation instead of the perlre documentation since it is a quote-like modifier rather than a regex modifier. That has always seemed odd to me, but that's how it is. Since Perl 5.20, it's now listed in perlre simply to note that you probably shouldn't use it.

Before Perl 5.6, Perl would recompile the regex even if the variable had not changed. You don't need to do that anymore. You could use /o to compile the regex once despite further changes to the variable, but as the other answers noted, qr// is better for that.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
7

In the Perl 5 version 20.0 documentation http://perldoc.perl.org/perlre.html it states

Modifiers

Other Modifiers

…

o - pretend to optimize your code, but actually introduce bugs

which may be a humorous way of saying it was supposed to perform some kind of optimisation, but the implementation is broken.

Thus the option might be best avoided.

Rhubbarb
  • 4,248
  • 6
  • 36
  • 40
  • 2
    The technical background is that m//o behaves differently under threaded and non-threaded perls. With threaded perls m/$foo/o does not recompile the regex with a new value of $foo, but without threads it does. This is usually considered a bug. Until 5.005_02 you could change $foo at compile-time, but not at run-time anymore, which is more consistent (OPpRUNTIME). But this got never fixed, so now m/$foo/o is considered buggy with threaded perls. with v5.26 the OPpRUNTIME flag was totally removed without explaining the background of the problem. – rurban Jan 12 '17 at 16:53
  • Hmm, if you view the `/o` flag as a promise by the programmer that the regexp will not change, then I don't think either of these behaviours is a bug. The programmer said to the compiler: I can assure you that this regexp interpolating $foo will not change from when it is first compiled, so do whatever optimizations you see fit using that knowledge. The compiler can then choose to recompile the regexp or not (though clearly not recompiling it will usually be faster). That said, the original documentation for `/o` didn't really describe it in these terms. – Ed Avis May 30 '19 at 13:22
4

This is an optimization in the case that the regex includes a variable reference. It indicates that the regex does not change even though it has a variable within it. This allows for optimizations that would not be possible otherwise.

denis phillips
  • 12,550
  • 5
  • 33
  • 47
  • Optimization or bug? https://stackoverflow.com/questions/68454089/perl-regex-o-optimization-or-bug/68454090#68454090 – Tim Potapov Jul 22 '21 at 18:24
2

Here are timings for different ways to call matching.

$ perl -v | grep version
This is perl 5, version 20, subversion 1 (v5.20.1) built for x86_64-linux-gnu-thread-multi

$ perl const-in-re-once.pl | sort
0.200   =~ CONST
0.200   =~ m/$VAR/o
0.204   =~ m/literal-wo-vars/
0.252   =~ m,@{[ CONST ]},o
0.260   =~ $VAR
0.276   =~ m/$VAR/
0.336   =~ m,@{[ CONST ]},

My code:

#! /usr/bin/env perl

use strict;
use warnings;

use Time::HiRes qw/ tv_interval clock_gettime gettimeofday /;
use BSD::Resource qw/ getrusage RUSAGE_SELF /;

use constant RE =>
    qr{
        https?://
        (?:[^.]+-d-[^.]+\.)?
        (?:(?: (?:dev-)? nind[^.]* | mr02 )\.)?
        (?:(?:pda|m)\.)?
        (?:(?:news|haber)\.)
        (?:.+\.)?
        yandex\.
        .+
    }x;

use constant FINAL_RE => qr,^@{[ RE ]}(/|$),;

my $RE = RE;

use constant ITER_COUNT => 1e5;

use constant URL => 'http://news.trofimenkov.nerpa.yandex.ru/yandsearch?cl4url=www.forbes.ru%2Fnews%2F276745-visa-otklyuchila-rossiiskie-banki-v-krymu&lr=213&lang=ru';

timeit(
    '=~ m/literal-wo-vars/',
    ITER_COUNT,
    sub {
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ m{
                ^https?://
                (?:[^.]+-d-[^.]+\.)?
                (?:(?: (?:dev-)? nind[^.]* | mr02 )\.)?
                (?:(?:pda|m)\.)?
                (?:(?:news|haber)\.)
                (?:.+\.)?
                yandex\.
                .+
                (/|$)
            }x
        }
    }
);

timeit(
    '=~ m/$VAR/',
    ITER_COUNT,
    sub {
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ m,^$RE(/|$),
        }
    }
);

timeit(
    '=~ $VAR',
    ITER_COUNT,
    sub {
        my $r = qr,^$RE(/|$),o;
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ $r
        }
    }
);

timeit(
    '=~ m/$VAR/o',
    ITER_COUNT,
    sub {
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ m,^$RE(/|$),o
        }
    }
);

timeit(
    '=~ m,@{[ CONST ]},',
    ITER_COUNT,
    sub {
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ m,^@{[ RE ]}(/|$),
        }
    }
);

timeit(
    '=~ m,@{[ CONST ]},o',
    ITER_COUNT,
    sub {
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ m,^@{[ RE ]}(/|$),o
        }
    }
);

timeit(
    '=~ CONST',
    ITER_COUNT,
    sub {
        my $r = qr,^$RE(/|$),o;
        for (my $i = 0; $i < ITER_COUNT; ++$i) {
            URL =~ FINAL_RE
        }
    }
);

sub timeit {
    my ($name, $iters, $code) = @_;
    #my $t0 = [gettimeofday];
    my $t0 = (getrusage RUSAGE_SELF)[0];
    $code->();
    #my $el = tv_interval($t0);
    my $el = (getrusage RUSAGE_SELF)[0] - $t0;
    printf "%.3f\t%-17s\t%.9f\n", $el, $name, $el / $iters
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user55921
  • 21
  • 1
  • 1
    Welcome to Stack Overflow. Please read the [About] page. Thank you for running these timings. The first three results are likely to be essentially the same; the second three might also be 'the same', or the might be some gradation in the timings; the last result seems to be separate. Can you include the code you used in your tests so I can understand more exactly what you measured? If there was a data file, you shouldn't include it, but it would help to identify how big the file was — the number of lines and total number of bytes in it. – Jonathan Leffler Dec 26 '14 at 18:02
1

Yep and Nope

I ran a simple comparison using the follow script:

perl -MBenchmark=cmpthese -E 'my @n = 1..10000; cmpthese(10000, {string => sub{"a1b" =~ /a\d+c/ for @n}, o_flag => sub{"a1b" =~ /a\d+c/o for @n}, qr => sub{my $qr = qr/a\d+c/; "a1b" =~ /$qr/ for @n } })'

Here are the results:

         Rate     qr string o_flag
qr      760/s     --   -72%   -73%
string 2703/s   256%     --    -5%
o_flag 2833/s   273%     5%     --

So, clearly the /o flag is much faster than using qr.

But apparently the /o flag may cause bugs: Perl regex /o optimization or bug?

Tim Potapov
  • 431
  • 2
  • 12
0

One thing it, mystifyingly, does not do is, allow a ONCE block, at least at 5.8.8.

perl -le 'for (1..3){ print; m/${\(print( "between 1 and 2 only"), 3)}/o and print "matched" }'

Never Sleep Again
  • 1,331
  • 1
  • 9
  • 10