1

I have checked that ^* and ^& match lines beginning by * and &, which I didn't since they are special characters. But ^[ doesn't work. Is this "standard" behavior? Is there any rationale behind this?

sed version used was "GNU sed 4.4".

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
jinawee
  • 492
  • 5
  • 16
  • 2
    There are actually 7 types of character to consider when you use regexps in sed: 1) BRE regexp metachars (need to be escaped to be literal), 2) ERE regexp metachars (are literal **unless** they are escaped or used with e.g. -E in some seds), 3) script delimiters (usually `/` but can be any), 4) backreference metachars (e.g. `\1`), 5) chars that mean different things in different contexts (e.g. `^` in `^x` vs `[^x]` vs `[x^y]`, or `-` in `a-c` vs `[a-c]` vs `[ac-]`), 6) chars with different meanings in different seds (e.g. `\<`) and 7) every other char (literal). Happy googling :-). – Ed Morton Aug 21 '18 at 12:02

2 Answers2

2

See sed "3.3 Overview of Regular Expression Syntax" documentation.

The & char is not a special regex char, it does not need escaping in a regex pattern. Note that & can be parsed as a special construct in the replacement pattern where is refers to the whole match.

The * is not special when it is at the start in GNU sed (^* is a pattern that matches a * at the start of the string):

POSIX 1003.1-2001 says that * stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use \* in these contexts.

The [ starts a bracket expression and must have a paired ] to close the expression, hence it is an error.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

From POSIX.1-2017:

The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]

Reading the POSIX section on BREs, we read:

A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:

  • .[\: The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.
  • *: The <asterisk> shall be special except when used:
    • In a bracket expression
    • As the first character of an entire BRE (after an initial '^', if any)
    • As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
  • ^: The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).
  • $: The <dollar-sign> shall be special when used as an anchor.

source: Basic Regular Expressions, Special characters

So to answer the OPs question using the above:

  • & is not a special character, so ^& is expected to work
  • [ should always be escaped if it is not used as a bracket expression.
  • * is not special after an initial ^ when the latter is an anchor.

So all observed statements by the OP are therefore valid.

There is however still an interesting paragraph in RE Bracket Expression:

A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> ( ] ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>( ^ ), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters ., *, [, and \\ ( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.

source: Basic Regular Expressions, RE Bracket Expression

This implies that ] cannot be escaped in a bracket expression. This means:

The following work:

$ echo '[]' | sed 's/[^]x]/a/'
a]
$ echo '[]' | sed 's/[^x[.].]]/a/'
a]

but this does not work as expected:

$ echo '[]' | sed 's/[^x\]]/a/'
[]

So in a Bracket Expression, dont escape it, but collate it!

kvantour
  • 25,269
  • 4
  • 47
  • 72