Over the years, "regex" pattern matching has been getting more and more powerful to the point where I wonder: is it really just context-sensitive-grammar matching? Is it a variation/extension of context-free-grammar matching? Where is it right now and why don't we just call it that instead of the old, restrictive "regular expression"?
3 Answers
In particular backreferences to capturing parentheses make regular expressions more complex than regular, context-free, or context-sensitive grammars. The name is simply historically grown (as many words). See also this section in Wikipedia and this explanation with an example from Perl.

- 44,988
- 7
- 85
- 112
-
Could you please explain the difference between `regular language` and `regular expression`? – Christian Klauser Mar 04 '09 at 22:18
-
2Is it really more powerful than CSG? Could you give an example? – notnot Mar 04 '09 at 22:22
-
A regular language can be described by a regular grammar (see http://en.wikipedia.org/wiki/Regular_grammar), while regular expressions are a pattern matching language that is less restricted and therefore more complex to process. – Fabian Steeg Mar 04 '09 at 22:25
-
Thanks for your comment notnot, I've added a link to a sample and some details. – Fabian Steeg Mar 04 '09 at 22:41
-
Hmmm... does this mean that we can match against arbitrary CSG's using today's tools. – notnot Mar 04 '09 at 22:51
-
Hm, not really sure what you are asking but I believe no, it simply means some regular expressions can't even be described with a CSG and so can't be parsed by a linear bounded automaton. – Fabian Steeg Mar 04 '09 at 23:02
-
Where are the boundary lines, then, on our regex tools? How can we write expressions that defy the limits of CSG's and yet not be able to express all CSG's? – notnot Mar 04 '09 at 23:14
-
I don't know, I just meant to say that the fact that some things work that cannot be expressed by a CFG does not necessarily imply that all things that can be will work too. – Fabian Steeg Mar 04 '09 at 23:50
-
Hey, I like the idea of defining "regex" as distinct from "regular expressions", and not just a contraction. Problem is, it still looks like a contraction. – 13ren Mar 05 '09 at 00:42
-
@notnot, not all sets/pairs have a strict order (which is a subset of which out of `{1,2,3}` and `{1,2,4}` - the answer is neither). Similarly it is quite easy to show that (for e.g.) Parsing Expression Grammars can match some languages that cannot be matched by CFGs (e.g. `a^nb^nc^n` is not context free but can be parsed by a PEG) and simultaneously there is not a PEG equivalent for all CFGs. – tobyodavies Feb 27 '11 at 13:51
The way I see it:
- Regular languages:
- Matched by state machines. Only one variable can be used to represent the current "location" in the grammar to be matched: Recursion cannot be implemented
- Context-free languages:
- Matched by a stack machine. The current "location" in the grammar is represented by a stack in one or another form. Cannot "remember" anything that occurred before
- Context-sensitive languages:
- Most programming languages
AllMost human languages
I do know of regular expression parsers that allow you to match against something the parser has already encountered, achieving something like a context-sensitive grammar.
Still, regular expression parsers, however sophisticated they may be, don't allow for recursive application of rules, which is a definite requirement for context-free grammars.
The term regex, in my opinion, mostly refers to the syntax used to express those regular grammars (the stars and question marks).

- 4,416
- 3
- 31
- 42
-
Lookahead/lookbehind and naming definitely add something that sits outside of standard regular expressions - memory. So aren't we at PDA level? – notnot Mar 04 '09 at 22:28
-
1It's not in general true that natural language is context-sensitive, see http://www.eecs.harvard.edu/~shieber/Biblio/Papers/shieber85.pdf – Fabian Steeg Mar 04 '09 at 22:54
There are features in modern regular expression implementations that break the rules of the classic regular expression definition.
For example Microsoft’s .NET Balancing Group (?<
name1
-
name2
> … )
:
^(?:0(?<L>)|1(?<-L>))*(?(L)(?!))$
This does match the language L₀₁ = {ε, 01, 0011, 000111, … }. But this language is not regular according to the Pumping Lemma.

- 643,351
- 109
- 780
- 844
-
I know that it goes beyond classic regex, but I'm wondering how much further. Fabian's link above is interesting. – notnot Mar 04 '09 at 22:49