1

First off,I don't study Computer Science, I'm just interested in the subject.

A parser basically does this right:

  1. reading the input
  2. create tokens
  3. actually parse tokens and create an AST

So I thought that in order to decide whether a word is in a regular language, you use a FSM and for CF languages you need a parser because of the recursive structures that may exist. Hence, scanner generators exist for regular languages and parser generators for CF languages.

But now I read that you can build a recursive decent parser for regular expressions:

http://matt.might.net/articles/parsing-regex-with-recursive-descent/

So how does this all go togther?

Why do I need to parse regular languages? I thought a finite state machine was enough?

If, e.g. I want to recognize block comments in a java programme (i.e. /* .. */), I only need to write a FSM, so basically a switch-case-statement. I dont need a parser for this...

Thanks for help and clarification!

user3629892
  • 2,960
  • 9
  • 33
  • 64

1 Answers1

1

There is a difference between what a regular expression can match and what you need to parse a regular expression. Regular expressions can contain nested groups for instance, so you can't parse those with a regular expression. You have to ”count” nested pairs of parenthesis for example, which is outside the capabilities of a regular language.

See also: Is there a regular language to represent regular expressions.

BlackJack
  • 4,476
  • 1
  • 20
  • 25
  • aah... so regular expressions themselves are not regular languages, only what they match is? So for checking, whether something is within a regular language, you use a FSM, but the regular expression itself has to be parsed? But still: I only need a parser for CF languages? For RL, a FSM is enough? – user3629892 Mar 31 '15 at 15:27
  • What is a ”parser” here? For instance the C function `scanf()` is used to parse string representations into different basic data types like numbers or strings. JavaScript has a `parseInt()` function to parse a string into a number. Both don't need more than a FSM-like implementation to parse the input. And yes, for parsing a RL a FSM is enough. – BlackJack Mar 31 '15 at 15:49
  • hmm okay.. I thought a "parser" meant that there is a syntax tree... and in the case of parseInt, e.g., it was called a scanner... – user3629892 Apr 08 '15 at 14:24