2

I have encountered the following code:

my $string = "fo2345obar";
$string =~ s<(\d+)><$1\.>g;

I cannot understand what angle brackets mean in this context; all substitutions I saw before were of form:

$string =~ s/(\d+)/$1\./g;

What do angle brackets mean?

Mikhail
  • 3,666
  • 4
  • 30
  • 43

4 Answers4

5

In Perl, you can use any delimiters you like, so the following are all equivalent:

s/PATTERN/REPLACEMENT/
s=PATTERN=REPLACEMENT=
s,PATTERN,REPLACEMENT,

This is possible in sed, too.

But, in Perl, some of the delimiters are special because they come in pairs, e.g.

s{PATTERN}{REPLACEMENT}
s<PATTERN><REPLACEMENT>
s(PATTERN)[REPLACEMENT]

etc.

choroba
  • 231,213
  • 25
  • 204
  • 289
3

I'd like to add (to the numerous answers) that there can be two different pairs of delimiters in a s/// statement.

@choroba had an example on s()[] in their answer but didn't draw any particular attention to it, so I want to add this for emphasis.


Consider the regex replacement expression in the form s{aaa}{bbb}.

It's only if the first delimiter is non-paired that it will dictate the delimiter for the entire expression. Then the end-of-match and start-of-replacement delimiters in the middle bleed together (they're the same character), and instead of s#aaa##bbb# you get

s#aaa#bbb#

On the other hand, if you lead with paired delimiters for the match part, you're free to choose again for the replacement:

# not only
s{aaa}{bbb}

# but also
s{aaa}[bbb]

# and even
s{aaa}#bbb#
s{aaa}/bbb/

This is mostly useful for things like:

s{aaa}'$literalstring'

to avoid interpolation and escaping in the replacement part, while allowing interpolation in the match.


While this usually makes the code harder to take in (and is therefore not advisable), I'm making a note of it here for completeness's sake - you may come across it in code.

Silvar
  • 705
  • 3
  • 8
2

Perl documentation for regex specifies delimiter characters. A few possible cases usage of different delimiters

$string =~ s/pattern/replacement/g;
$string =~ s{pattern}{replacement}g;
$string =~ s!pattern!replacement!g;
$string =~ s|pattern|replacement|g;
$string =~ s<pattern><replacement>g;
$string =~ s#pattern#replacement#g;
$string =~ s'pattern'replacement'g;
$string =~ s,pattern,replacement,g;

Most common used delimiter is / but in some situations it's usage make pattern less readable

$path = '/home/user1/dir1/dir2';

compare these three

$path =~ s/user1\/dir1\/dir2/user2\/dir3\/dir4/;
$path =~ s!user/dir1/dir2!user2/dir3/dir4!;
$path =~ s#user/dir1/dir2#user2/dir3/dir4#;

Please see perlrequick documentation to understand regular expressions and what symbols can be used as delimiters.

Perhaps you will find following website useful in your study perl programming.

Tutorial perl regular expressions

Stackoverflow question 21335765

Polar Bear
  • 6,762
  • 1
  • 5
  • 12
2

The traditional way to change foo to bar in a string is to do this:

$string =~ s/foo/bar/;

This is all well and good, but what if foo and bar contain lots of / characters? For example, what if you want to change all instances of http:// to https://? You'd have to escape out the / characters with \, and so you'd see something like this:

$string =~ s/http:\/\//https:\/\//;

You have to admit that the above substitution is not easy on the eyes. It's not so easy to tell which / belong to s/// and which belong to http://. And the \ characters used to escape the / characters just make the whole line more difficult to understand.

Some people call this "Leaning Toothpick Syndrome," because all the / and \ characters look like leaning toothpicks.

The good news is that you're not required to use / when using s///. There are many other characters you can use instead, which make the following lines equivalent to the one above:

 $string =~ s#http://#https://#;
 $string =~ s@http://@https://@;
 $string =~ s!http://!https://!;
 $string =~ s|http://|https://|;

Because the regular expression delimiter is no longer the default /, you no longer have to escape out the /, making the whole line much easier to read.

You would have to escape out whichever delimiter you use, so to avoid "Leaning Toothpick Syndrome" I would advise using a delimiter that does not appear in your expression.

There are many characters you can use as delimiters; I won't list them here. However, there are four special bracket delimiters that come in pairs and surround your expressions, much like parentheses do. They are: ( and ), [ and ], { and }, and finally, < and >. You use them like this:

$string =~ s(http://)(https://);
$string =~ s<foo><bar>g;

A bit easier to read without all those toothpicks, isn't it?

Your substitution expression will be easier to read if the delimiter you use doesn't exist in the expression itself. (Otherwise, you will have to escape it out with \.)

What's a little strange with the example you gave:

$string =~ s<(\d+)><$1\.>g;

Is that there are no / characters anywhere in the expression, so presumably it's not any clearer than:

$string =~ s/(\d+)/$1\./g;

However, maybe the original coder thought that the two \ characters would contrast with the / characters, and so wanted to avoid the / delimiters altogether.

Note: You didn't ask about this, but the $1\. part of the substitution needlessly escapes out the . character. You don't need the \ there, because the second part of a s/// substitution is not a regular expression -- it is only the replacement, and the . is never used there to match a character. So there's no point in escaping it in your case, as . always means a literal . there.

So the above line would be better written as:

$string =~ s/(\d+)/$1./g;

I hope this helps!

J-L
  • 1,786
  • 10
  • 13