2

Why repeated strings such as [wcw|w is a string of a's and b's] cannot be denoted by regular expressions? pls. give me detailed answer as i m new to lexical analysis. Thanks ...

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
paragjain
  • 265
  • 1
  • 5
  • 12
  • Bear in mind that parsing is the main subject of one of the hardest courses I took in grad school (Compilers I). There's a pretty good answer already, but you may not have the background to make use of it. – David Thornley Mar 05 '09 at 20:54
  • Well, it wasn't easy. But at least it was fun, sometimes. Although here it included optimization as well as several algorithms beyond parsing. Any ideas how to make that post clearer to someone without much background? -.- – Joey Mar 05 '09 at 21:43

2 Answers2

5

Regular expressions in their original form describe regular languages/grammars. Those cannot contain nested structures as those languages can be described by a simple finite state machine. Simplified you can picture that as if each word of the language grows strictly from left to right (or right to left), where repeating structures have to be explicitly defined and are static.

What this means is, that no information whatsoever from previous states can be carried over to later states (a few characters further in the input). So if you have your symbol w you can't specify that the input must have exactly the same string w later in the sequence. Similarly you can't ensure that each opening paranthesis needs a closin paren as well (so regular expressions themselves are not even a regular language and thus cannot be described by regular expressions :-)).

In theoretical computer science we worked with a very restricted set of regex operators, basically only consisting of sequence, alternative (|) and repetition (*), everything else can be described with those operations.

However, usually regex engines allow grouping of certain sub-patterns into matches which can then be referenced or extracted later. Some engines even allow to use such a backreference in the search expression string itself, thereby allowing the expression to describe more than just a regular language. If I remember correctly such use of backreferences can even yield languages that are not context-free.

Additional pointers:

Community
  • 1
  • 1
Joey
  • 344,408
  • 85
  • 689
  • 683
  • Right. The wcw example above can't be done using a context-free grammar as far as I can see (certainly not if it's wcwcw), but it's easy to check it in Perl. – David Thornley Mar 05 '09 at 20:53
2

It can be, you just can't assure that it's the same string of "a"s and "b"s because there's no way to retain the information acquired in traversing the first half for use in traversing the second.

MarkusQ
  • 21,814
  • 3
  • 56
  • 68