2
echo [18%] | sed s:[\[%\]]::g

I'm really confused by this, because the same exact pattern successfully replaces [18%] in vim. I've also tested the expression in a few online regex tools and they all say that it will match on the [, %, and ] as intended. I have tried adding the -r option as well as surrounding the substitution command in quotes.

I know that there are other commands that I could use to accomplish this task, but I want to know why it is behaving this way so I can get a better understanding of sed.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578

1 Answers1

6
$ echo [18%] | sed s:[][%]::g
18

sed supports POSIX.2 regular expression syntax: basic (BRE) syntax by default, extended syntax with the -r flag. In POSIX.2 syntax, basic or extended, you include a right square bracket by making it the first character in the character class. Backslashes do not help.

This is annoying because almost every other modern language and tool uses Perl or Perl-like regex syntax. POSIX syntax is an anachronism.

You can read about the POSIX.2 syntax in the regex(7) man page.

 A bracket expression is a list of  characters  enclosed  in  "[]".   It  normally
 matches  any  single character from the list (but see below).  If the list begins
 with '^', it matches any single character (but see below) not from  the  rest  of
 the  list.  If two characters in the list are separated by '-', this is shorthand
 for the full range of characters between those two (inclusive) in  the  collating
 sequence,  for  example, "[0-9]" in ASCII matches any decimal digit.  It is ille‐
 gal(!) for two ranges to share an endpoint, for  example,  "a-c-e".   Ranges  are
 very  collating-sequence-dependent, and portable programs should avoid relying on
 them.

 To include a literal ']' in the list, make it the first  character  (following  a
 possible '^').  To include a literal '-', make it the first or last character, or
 the second endpoint of a range.  To use a literal '-' as the first endpoint of  a
 range,  enclose  it in "[." and ".]"  to make it a collating element (see below).
 With the exception of these and some  combinations  using  '['  (see  next  para‐
 graphs), all other special characters, including '\', lose their special signifi‐
 cance within a bracket expression.
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • Thank you for that explanation. However, shouldn't the `-r` option have taken care of that by instructing sed to use extended regular expressions? I'm using bash on a linux machine, by the way. – Garrett S Holbrook Aug 28 '15 at 00:58
  • The way to include `]` in a bracket expression is the same for both BREs and EREs (i.e. make it the first character ) so using `-r` or not will make no difference. – Ed Morton Aug 28 '15 at 01:12
  • The quoted text applies to both varieties. See the last sentence in particular: "...all other special characters, including '\', lose their special significance with a bracket expression." Unfortunately, `-r` doesn't change this. – John Kugelman Aug 28 '15 at 01:14
  • Okay, that makes sense. Thanks guys! – Garrett S Holbrook Aug 28 '15 at 01:22