6

What's the best way to clear/reset all regex matching variables?

  • Example how $1 isn't reset between regex operations and uses the most recent match:

    $_="this is the man that made the new year rumble"; 
    / (is) /; 
    / (isnt) /; 
    say $1;          # outputs "is"
    
  • Example how this may be problematic when working with loops:

    foreach (...){
       /($some_value)/;
       &doSomething($1) if $1;
    }
    

Update: I didn't think I'd need to do this, but Example-2 is only an example. This question is about resetting matching variables, not the best way to implement them.

Regardless, originally my coding style was more inline with being explicit and using if-blocks. After coming back to this (Example2) now, it is much more concise in reading many lines of code, I'd find this syntax faster to comprehend.

vol7ron
  • 40,809
  • 21
  • 119
  • 172
  • 8
    This question is also "If I have to shoot myself, what should I aim for?" – brian d foy Apr 18 '12 at 21:45
  • @briandfoy: exactly :) - saw the sched for oscon, looking forward to seeing your pres. again. I think people are thinking I don't understand what an `if (//)` does and why it should be used. Often, when I'm on here it's because someone else didn't and I'm looking for the easiest way out of their code. In this case, my answer is the only one that answers the question I asked. *Correction: mine and ikegami*. – vol7ron Apr 18 '12 at 23:51
  • 2
    Then why don't you ask it that way? Explanation of the circumstances make stupid questions into good questions sometimes. – matthias krull Apr 19 '12 at 00:04
  • @mugenkenichi: I think I was pressed for time, but yeah you're probably right ;) – vol7ron Apr 19 '12 at 04:05
  • 2
    You can still edit your question to explain why you are asking it. But, if you are pressed for time, why would you post a speculative question at all? Shouldn't you have been doing other things than wasting other people's time? You seemed to have plenty of time to comment on other people's answers. – brian d foy Apr 19 '12 at 10:47
  • @briandfoy: The funny thing about time is that it changes and it's availability is dependent on the task at hand. I have a limited amount of time at work, but when I get home or go to a cafe, I have a lot more to address comments/concerns. -- The question still hasn't changed, though. – vol7ron Apr 19 '12 at 12:33
  • @mugenkenichi: I did update the question (at the bottom). I forgot that SO is really horrid when it comes to asking specific questions about Perl. *I should have just went straight to PerlMonks* as they seem to understand when a person of a certain rating/level and asks a seemingly dumb question, there is generally a reason, and that reason is generally a waste of time explaining/understanding. I find these kind of problems all the time on SO, just like when someone asks a JavaScript users here want to offer "just do it in jQuery/Scriptaculous" or those solutions off the bat. – vol7ron Apr 19 '12 at 12:42
  • Your edit does not explain your motives and I do not care for rating when the question just is not a good one. I did not mean to offend you personally and didn't downvote. – matthias krull Apr 19 '12 at 14:15
  • @mugenkenichi: I'm not offended :) As I said before, I actually am grateful for all the help and participation members here have given. (I'd probably get more frustrated with questions that don't get any attention). I hope I haven't come off as unappreciative because that is not the case. I genuinely respect each and every one of you. It looks like the real answer here is that, there's still no way to alter system/internal variables, which is probably still a good thing. – vol7ron Apr 19 '12 at 14:28

6 Answers6

18

You should use the return from the match, not the state of the group vars.

foreach (...) {
    doSomething($1) if /($some_value)/;
}

$1, etc. are only guaranteed to reflect the most recent match if the match succeeds. You shouldn't be looking at them other than right after a successful match.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
  • You're missing the point. It was an example (only partial code). The question is about resetting the backreference. – vol7ron Apr 18 '12 at 20:48
  • 12
    The larger point is that you shouldn't be doing anything that relies on resetting the backreference. This was an example of how to avoid that reliance in the code you posted; if you have a different example, please post it so we can demonstrate how to avoid it there. Relying on the content of $1 in any context other than immediately after a successful match is a bug. Period. – Mark Reed Apr 18 '12 at 20:54
  • 3
    @vol7ron No. This is exactly on spot and documented behaviour. [perlvar](http://perldoc.perl.org/perlvar.html#Variables-related-to-regular-expressions) – matthias krull Apr 18 '12 at 20:55
  • Mark in general purpose you're right, but I have been programming Perl for many years and now am updating someone else's code. When you have heredocs all over the place and interpolation gone bad, it's the simple duct tape that's needed. Therefore, while your answer may be valid to a different question, it doesn't apply to the question asked. Which it seems, I have the only solution. – vol7ron Apr 18 '12 at 21:02
  • @mugenkenichi I know what the behavior is, I was pretty explicit when I wrote "*The example above shows that $1 isn't reset between regex operations and **uses the most recent match.***" That wasn't a question or an expression of confusion; it was a statement to address that I know what's happening. – vol7ron Apr 18 '12 at 21:04
  • 5
    `perlre` specifically states "Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match." – JRFerguson Apr 18 '12 at 21:05
  • 2
    @vol7ron: Fair enough. In your situation I would still look for other relatively low-impact refactoring opportunities that didn't involve this sort of hack, but if you're determined to go that route... I don't suppose there's a scope difference, in which case you could declare $1 as local? Failing that, I think your own solution is the only option. – Mark Reed Apr 18 '12 at 21:08
  • 2
    ...or maybe you could not use $1 etc. at all, but assign the result of the match to a lexical var instead. If you do `my @a = /.../`, then $a[1] will be undef if the match fails. – Mark Reed Apr 18 '12 at 21:12
  • @MarkReed: that's kind of what I was curious about when I asked the question. Whether $1 is reset upon exiting a block, or if there are new Perl functions to act on internals. Basically, something that isn't yet common knowledge. – vol7ron Apr 18 '12 at 21:12
  • 1
    @vol7ron maybe those misunderstandings are the reason the `why would i have to do this?` should be included in questions sometimes :) – matthias krull Apr 18 '12 at 21:15
  • 1
    Put it this way, you only trust `$1` if there was a match. You might be interested in this: http://stackoverflow.com/questions/4045467/perl-match-outside-if-doesnt-reset-1-on-loop – Joel Berger Apr 18 '12 at 22:03
  • No. I think you all think I don't understand what's happening, which is not the case. I asked the question a specific way for a specific reason. Don't get me wrong, I appreciate the answers and have upvoted you all, but it doesn't answer *the question I asked*, which is a little frustrating. – vol7ron Apr 18 '12 at 23:53
  • Ok then: "I don't think there is a way, other than the normal Perl ways of clearing a variable" – Joel Berger Apr 19 '12 at 03:32
14

Regex captures* are reset by a successful match. To reset regex captures, one would use a trivial match operation that's guaranteed to match.

"a" =~ /a/;  # Reset captures to undef.

Yeah, it looks weird, but you asked to do some thing weird.

If you fix your code, you don't need weird-looking workarounds. Fixing your code even reveals a bug!

Fixes:

$_ = "this is the man that made the new year rumble"; 
if (/ (is) / || / (isnt) /) {
   say $1; 
} else{ 
   ...  # You're currently printing something random.
}

and

for (...) {
   if (/($some_pattern)/) {
      do_something($1);
   }
}

* — Backrefs are regex patterns that match previously captured text. e.g. \1, \k<foo>. You're actually talking about "regex capture buffers".

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • I'd give you double points for correcting me, if I could. Though, the docs call it "matching variables" ;) – vol7ron Apr 18 '12 at 23:55
  • The only place I see "matching variables" is in [perlrequick](http://perldoc.perl.org/perlrequick.html) and [perlretut](http://perldoc.perl.org/perlretut.html). – brian d foy Apr 19 '12 at 10:46
  • @brian d foy, I see "match variables" it in perlre (which makes more sense than "matching variables"), but I really dislike the term. It's ambiguous if not meaningless. "Capture buffer" is use prominently in the documentation (at least in perlre and perlvar). `$1` technically only "allows access to a capture buffer" rather than being a capture buffer itself, but that's hair splitting. – ikegami Apr 19 '12 at 18:44
  • Not all match variables are capture buffers though. – brian d foy Apr 25 '12 at 16:27
  • @brian d foy, Meaning what? The docs say `\1` and `k` access capture buffers, so both named and numbered captures are documented to access capture buffers. What does that leave? – ikegami Apr 25 '12 at 16:37
  • What do you mean "what does that leave"? perlre lists all of the variables involved with matching, and not all of them are captures, such as `@-` and `@+`. – brian d foy Apr 30 '12 at 21:44
  • @brian d foy, You're making my point that "match variables" is ambiguous. Your definition differs from the one in the docs. Only the variables that "allow access to capture buffers" are called "match variables" or "matching variables" by the docs. Not `@-` and `@+`. (Those are "Variables related to regular expressions".) This ambiguity is why I use "capture buffers" (or "captures" for short) even though that's slightly different from how the docs uses the term. (According to the docs `$1` isn't a capture buffer, but it "allows access to a capture buffer".) – ikegami May 01 '12 at 05:59
  • I think we agree here. I say "capture buffers" to mean the ones that capture, but use "match variables" to include the other ones. And, I'm the one who wrote "Variables related to regular expressions" in the docs. – brian d foy May 03 '12 at 16:19
  • 1
    Not so much showing off but noting that when I can change the docs, me arguing about what the docs say is a bit weird. It's like I'm cheating. – brian d foy Jan 07 '13 at 09:13
6

You should test whether the match succeeded. For example:

foreach (...){
   /($some_value)/ or next;
   doSomething($1) if $1;
}

foreach (...){
   doSomething($1) if /($some_value)/ and $1;
}

foreach (...){
   if (/($some_value)/) {
      doSomething($1) if $1;
   }
}

Depending on what $some_value is, and how you want to handle matching the empty string and/or 0, you may or may not need to test $1 at all.

cjm
  • 61,471
  • 9
  • 126
  • 175
  • You're missing the point. It was an example (only partial code). The question is about resetting the backreference. – vol7ron Apr 18 '12 at 20:48
  • 3
    @vol7ron you're missing the point. Stop reading the values of global variables when they're not valid. – hobbs Apr 18 '12 at 20:52
  • @hobbs: :) no. You, of all people, should know when code gets more complex you structure it differently to make it more understandable. The whole reason why you sometimes do `if () {...}` vs `do {...} if ()` – vol7ron Apr 18 '12 at 20:57
  • 3
    @vol7ron and when code gets more complex, you do everything you possibly can to avoid action at a distance, like using a `$1` that may or may not have been set at some time in the past. – hobbs Apr 18 '12 at 20:59
  • 2
    You can restructure code to make it more readable only in ways that don't break the code. Relying on the value of $1 when you didn't just have a successful match breaks the code. You should take this as a sign that, whatever you're trying to do, you need to do it differently. – Mark Reed Apr 18 '12 at 20:59
3

To complement the existing, helpful answers (and the sensible recommendation to normally test the result of a matching operation in a Boolean context and take action only if the test succeeds notwithstanding):

Depending on your scenario, you can approach the problem differently:

Disclaimer: I'm not an experienced Perl programmer; do let me know if there are problems with this approach.

Enclose the matching operation in a do { ... } block scopes all regex-related special variables ($&, $1, ...) to that block.

Thus, you can use a do { ... } to prevent these special variables from getting set in the first place (although the ones from a previous regex operation outside the block will obviously remain in effect); for instance:

$_="this is the man that made the new year rumble"; 

# Match in current scope; -> $&, $1, ... *are* set.
/ (is) /;

# Match inside a `do` block; the *new* $&, $1, ... values
# are set only *inside* the block; 
# `&& $1` passes out the block's version of `$1`.
$do1 = do { / (made) / && $1 };

print "\$1 == '$1'; \$do1 == '$do1'\n";  # -> $1 == 'is'; $do1 == 'made'
  • The advantage of this approach is that none of the current scope's special regex variables are set or altered; the accepted answer, by contrast, alters variables such as $&, and $'.
  • The disadvantage is that you must explicitly pass out variables of interest; you do get the result of the matching operation by default, however, and if you're only interested in the contents of capture buffers, that will suffice.
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

You shoud do it this way:

foreach (...) { 
   someFnc($1) if /.../; 
}

But if you want to stick with your style, then check this as an idea:

$_ = "this is the man that made the new year rumble";  

$m = /(is)/   ? $1 : undef;
$m = /(isnt)/ ? $1 : undef;

print $m, "\n" if defined $m;
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • 2
    Thats dirty.. in that there are more ideomatic ways to deal with this problem. – matthias krull Apr 18 '12 at 20:58
  • 1
    `$&` should be avoided at all costs. In your example `$1` would have the exact same data and avoid a global performance penalty. – Ven'Tatsu Apr 18 '12 at 21:44
  • @Ven'Tatsu - Certainly we all know that `$&` is last match, so in this case it is `$1`, but can you please explain what peformance penalty you are talking about? – Ωmega Apr 18 '12 at 21:54
  • 3
    Read the [perlvar](http://perldoc.perl.org/perlvar.html) entry on `$&` to see about the performance widely-known and generally-avoided performance penalty. – brian d foy Apr 18 '12 at 23:23
1

Assigning captures to a list behave closer to what it sounds like you want.

for ("match", "fail") {
    my ($fake_1) = /(m.+)/;
    doSomething($fake_1) if $fake_1;
}
Ven'Tatsu
  • 3,565
  • 16
  • 18