0

Suppose I have the following regex in grep:

grep -E 'head \x1E tail'

I can make grep aware of non-ascii characters using its personal escape code, \x.

Can I do the same with Bash, without calling any external program? Bash provides 3 modes for matching regexes:

  1. plain pathname expansion
  2. extglob pathname expansion
  3. [[ string =~ regex ]]

None of these appears to support UTF escape codes, nor Bash itself.

davide
  • 2,082
  • 3
  • 21
  • 30

1 Answers1

1

In bash, you can use an ANSI-quoted string:

$ x=éclair
$ [[ $x =~ $'\xc3\xa9' ]] && echo matched
matched

Note that you have to specify the UTF-8 encoding of the character, as the ANSI-quoted string does not accept arbitrary Unicode code points.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • I'm not able to insert your example directly in my console, as it refuses any non-ascii. However, Bash (4.2.0) doesn't seem to expand the `\x` escape; for example this regex doesn't match: `x=$(echo -e '\x1e'); regex='\x1e'; [[ $x =~ $regex ]] && echo matched` – davide Dec 02 '12 at 00:08
  • I see that inserting `'\x1e'` directly (without passing thru the expansion of a variable) *does* expand to UTF characters. That's interesting, and new for me. I've never seen this on `man bash`. How is this called, or, could you point me to a reference? – davide Dec 02 '12 at 00:24