3
#!/usr/bin/perl
@lines = `perldoc -u -f atan2`;
foreach (@lines) {
  s/\w<([^>]+)>/\U$1/g;
  print;
}

How will the expression s/\w<([^>]+)>/\U$1/g;work?

Martin York
  • 257,169
  • 86
  • 333
  • 562
Rock
  • 157
  • 1
  • 3
  • 13
  • 2
    The [regex explainer](http://rick.measham.id.au/paste/explain.pl) is a very useful tool. :) – Ted Hopp Jan 08 '13 at 06:37
  • @TedHopp That regex explainer seems to throw some hiccups with this regex. I assume that's because it cannot handle substitutions. – TLP Jan 08 '13 at 06:53
  • shouldn't the stuff inside `foreach` loop have `$_` somewhere ? – slayedbylucifer Jan 08 '13 at 06:54
  • 3
    @slayedbylucifer The `$_` is used by default in substitutions and print. And other things as well. – TLP Jan 08 '13 at 06:56
  • 3
    @slayedbylucifer: That is a central concept to Perl, and the entire point of `$_` existing at all. – Borodin Jan 08 '13 at 07:33
  • @TLP - The regex explainer provides only a partial answer to OP's question. It explains just the regex itself (that is, `\w<([^>]+)>`), not the substitution expression (`\U$1`) or what the `g` at the end means. – Ted Hopp Jan 08 '13 at 14:18
  • @mep: Changing the title like that is not helping the original poster. I think he understands what he wants better than you. If you think the title is wrong then you should ask the poster. – Martin York Jan 08 '13 at 16:30

3 Answers3

4

The substitution does this:

s/             
    \w<         # look for a single alphanumeric character followed by <
    ([^>]+)     # capture one or more characters that are not <
    >           # followed by a >
/               ### replace with
   \U           # change following text to uppercase
   $1           # the captured string from above
/gx             # /g means do this as many times as possible per line

I added the /x modifier to be able to visualize the regex. The character class [^>] is negated, as denoted by the ^ character after the [, which means "any character except >".

For example, in the output from the perldoc command

X<atan2> X<arctangent> X<tan> X<tangent>

Is changed to

ATAN2 ARCTANGENT TAN TANGENT
TLP
  • 66,756
  • 10
  • 92
  • 149
4

Here is an another option to figure out what it is doing. Use the module YAPE::Regex::Explain from CPAN.

Using it in this fashion (This is just the match part of the search and replace):

use strict;
use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new(qr/\w<([^>]+)>/)->explain();

Will give this output:

The regular expression:

(?-imsx:\w<([^>]+)>)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \w                       word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^>]+                    any character except: '>' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  >                        '>'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

The substitute part of the expression is stating that the match which was made earlier between "group and capture to \1" and "end of \1" should be converted to uppercase.

Bee
  • 958
  • 5
  • 12
0

The perl loop looks like this:

foreach $item (@array)
{
   # Code in here. ($item takes a new value from array each iteration)
}

But perl allows you to leave out variables nearly everywhere.
When you do this the special variable $_ is used.

So in your case:

foreach (@lines) 
{
}

Is exactly the same as:

foreach $_ (@lines) 
{
}

Now inside the body the following code:

s/\w<([^>]+)>/\U$1/g;

Has the same thing happening. You are actually working on a variable. And when you do not specify a variable perl defaults to $_.

Thus it is the equivalent of:

$_ =~ s/\w<([^>]+)>/\U$1/g;

Combine the two:

foreach (@lines) {
  s/\w<([^>]+)>/\U$1/g;
  print;
}

Is equivalent too:

foreach $item (@lines)
{
    $item =~ s/\w<([^>]+)>/\U$1/g;
    print $item;
}

I use $item just for readability. Internally it means $_.

Lots of perl code uses this type of shortcut. Personally I think it makes it harder to read (even for experienced perl programmers (its one of the reason perl got a reputation for unreadability)). As a result I always try and be explicit about the use of variables (but this (my usage) is not typical perl usage).

Martin York
  • 257,169
  • 86
  • 333
  • 562
  • 3
    I think you should leave out the inflammatory statements about perl readability. Its not really difficult to remember that `for` without a target variable uses `$_`. – TLP Jan 08 '13 at 07:13
  • 1
    Agreed. These shortcuts and alternative ways of writing constructs allows a good coder to choose when to be verbose and when to be as short as possible, in order to make the code more readable and maintainable. Often writing something simple like `send_hello($_) foreach @people;` is very readable, whereas a loop containing multiple instructions is probably better written in full as `foreach my $person (@people) { ... }` – plusplus Jan 08 '13 at 11:04
  • @plusplus: Disagree. It is because of the ability to leave out variables that makes reading perl nearly imposable (you have to know the intent before you start). It takes a very good disciplined programer (who does not use the shortcuts) to write maintainable code in perl (and because most perl monkeys are not disciplined the language has received a reputation as write once (never read again)) language. – Martin York Jan 08 '13 at 16:11
  • @plusplus: On the other hand this has also made perl a very good resource as it keeps beginners out, and maintenance of projects in cpan is only done by the authors. So unlike php the code is usually good (even if you can't read it unless you are the author). – Martin York Jan 08 '13 at 16:14
  • @LokiAstari It is just because of this argumentative nature of your statement that I suggested you leave it out. I have no problem reading perl. http://stackoverflow.com/q/9591658/725418 If you can't read perl, then that's probably because you are not sufficiently skilled at it, not because authors at cpan write sloppy code. – TLP Jan 08 '13 at 16:24
  • @TLP: As you can see from the brilliant answer I can read perl fine. I also seem to be the only one that can actually read the question. Which asks how to the foreach works. – Martin York Jan 08 '13 at 16:28
  • @TLP: If you think `its one of the reason perl got a reputation for unreadability` is not a common feeling then you are not in touch with reality. Even perl programmers (like myself) understand that perl has a reputation (deserved or not). Trying to hide that helps nobody. We need to expose it point out our opinions and try and get the community to write better more maintainable code. Trying to bury the problem helps nobody. – Martin York Jan 08 '13 at 16:37
  • @LokiAstari The title says one thing, the last line in the question another. You're the only one who chose to read it that way, is more accurate. – TLP Jan 08 '13 at 16:49
  • @TLP. Even the last line works for my interpretation. It does not ask to explain the regular expression. It asks how the regular expression will work. When you read that in context with the title it means how will this regular expression work in the context of the loop when there is no variable it seems to be operating on. Anyway there is little point arguing. When the OP returns he will either accept your explanation or mine and the discussion will be closed. – Martin York Jan 08 '13 at 16:53
  • @LokiAstari I am trying to adhere to the spirit of StackOverflow and not engage in pointless debates of this kind, but you seem hell bent on provoking me. Is there a problem here? Or is it just that you don't think I've heard bad computer jokes before, and you want to share? – TLP Jan 09 '13 at 09:57