14

I got a regex in my code, which is to match pattern of url and threw error:

/^(http|https):\/\/([\w-]+\.)+[\w-]+([\w- .\/?%&=]*)?$/

The error was "empty range in char class error". I found the cause of that is in ([\w- .\/?%&=]*)? part. Ruby seems to recognize - in \w- . as an operator for range instead of a literal -. After adding escape to the dash, the problem was solved.

But the original regular expression ran well on my co-workers' machines. We use the same version of osx, rails and ruby: Ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. And after we deployed code to our Heroku server, everything worked fine too. Why did only my environment have error regarding this regex? What is the mechanism of Ruby regex interpreting?

sawa
  • 165,429
  • 45
  • 277
  • 381
Steve
  • 141
  • 1
  • 1
  • 4
  • 5
    I don't know why it worked on one machine and not on another, but hyphens in character classes should always be either escaped or at the beginning or end of the character class. Otherwise the engine might decide to make it a range. Hyphens are also allowed directly after other ranges (like `[A-Z-_]`) but this is rather discouraged, too, I'd say. – Martin Ender Oct 31 '12 at 15:57
  • 2
    What version of Ruby? Is it an earlier version with the optional regex support compiled in? Without provided any details regarding at least versioning, possibly OS, etc. it's impossible to help. – Dave Newton Oct 31 '12 at 15:58
  • Thank you guys for your help. To Dave: ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. I'm not sure if my ruby comes with other optional regex support. Can you share your thoughts please? – Steve Oct 31 '12 at 16:18
  • 3
    It's standard regex practice to place the dash at the end of the character class. – Mark Thomas Oct 31 '12 at 17:29

1 Answers1

18

I can replicate this error on Ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux], installed on Ubuntu 12.04.1 LTS using rvm 1.13.4. However, this should not be a version-specific error. In fact, I'm surprised it worked on the other machines at all.

A a simpler demonstration that fails just as well:

"abcd" =~ /[\w- ]/

This is because [\w- ] is interpreted as "a range beginning with any word character up to space (or blank)", rather than a character class containing a word, a hyphen, or a space, which is what you had intended.

Per Ruby's regular expression documentation:

Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is irrelevant.

As you saw, prepending a backslash escaped the hyphen, thus changing the nature of the regexp from a range to a character class, removing the error. However, escaping the hyphen in the middle of character class is not recommended, since it's easy to confuse the intended meaning of the hyphen in such cases. As m.buettner pointed out, always place hyphens either at the beginning or the end of a character class:

"abcd" =~ /[-\w ]/
Arman H
  • 5,488
  • 10
  • 51
  • 76