30

I've got these three files:

$ cat pattern-ok 
['\-]
$ cat pattern-buggy 
[\-']
$ cat text 
abc'def-ghi

And now, is the following a bug or a regexp feature I don't know?

$ cat text | grep -f pattern-ok 
abc'def-ghi
$ cat text | grep -f pattern-buggy
grep: Invalid range end

I'm using:

$ grep --version | head -n 1
grep (GNU grep) 2.20
Michal Rus
  • 1,796
  • 2
  • 18
  • 27
  • Found this via Google; turns out sometimes this error is unexpectedly caused due to a glibc bug. https://bugzilla.redhat.com/show_bug.cgi?id=583011 – polm23 Aug 23 '17 at 05:00
  • 1
    I got this because I'd written `[A-z0-9a-z]` and grep was right (the first z should be uppercase) –  Feb 12 '21 at 19:21

2 Answers2

51

This is because you are using the hyphen within other characters, so that grep understands it as a range, which happens to be invalid.

You are basically doing

grep "[\-']" file

This is interpreted by grep as you providing a range of characters to be checked on, like for example grep "[a-z]" file. But the range from \ to ' is invalid, hence the error.

And why the other one is working? You may be asking yourself. Because what you are doing is:

grep "['\-]" file

In this case you are looking for either the character ', \ or - in the file.

See another example of it, where I want to find characters a, - or 3 in a given string:

$ echo "23-2" | grep -o '[a-3]'
grep: Invalid range end
$ echo "23-2" | grep -o '[a3-]'
3
-
$ echo "23-2" | grep -o '[a3\-]'
3
-

So the underlying problem is that you are using an expression some character + - + another character within a [] block and it tries to be read as the range of characters between some character and another character.


How can you solve it?

If you want to match the character -, among others, just add it in the edges of the expression: as the first or last item.

From man grep:

Character Classes and Bracket Expressions

A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list; if the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit.

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as [0-9A-Za-z]. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • So how do I escape the `-` to be treated as a normal character? Is putting it at the end of `[]` group the only way? – Michal Rus Nov 05 '14 at 10:07
  • Yes, this will solve it. You can use `-` in the front or the back of the expression, so that it won't be treated as a range. – fedorqui Nov 05 '14 at 10:10
  • 3
    So there's *no way* to just escape it and put anywhere in the `[]` group? :O Pretty weird. For all my life I've been using it like `[abc\-def]` and didn't know the real meaning. Thank you. – Michal Rus Nov 05 '14 at 10:12
  • 1
    Yes, it is pretty weird indeed. I just added to the answer a reference in `man grep` where it exactly states this. – fedorqui Nov 05 '14 at 10:17
  • 2
    specifically, the range from "backslash" to "single quote" is invalid because the ASCII code for "backslash" is *greater than* than that of "single quote". The range `a-z` is OK because a is *smaller* than z, ASCII-wise. – glenn jackman Feb 23 '19 at 15:25
12

To avoid range value You can use fgrep.