Trying to understand parsing and scanning (difference for reg. languages and cf languages)

Question

First off,I don't study Computer Science, I'm just interested in the subject.

A parser basically does this right:

reading the input
create tokens
actually parse tokens and create an AST

So I thought that in order to decide whether a word is in a regular language, you use a FSM and for CF languages you need a parser because of the recursive structures that may exist. Hence, scanner generators exist for regular languages and parser generators for CF languages.

But now I read that you can build a recursive decent parser for regular expressions:

http://matt.might.net/articles/parsing-regex-with-recursive-descent/

So how does this all go togther?

Why do I need to parse regular languages? I thought a finite state machine was enough?

If, e.g. I want to recognize block comments in a java programme (i.e. /* .. */), I only need to write a FSM, so basically a switch-case-statement. I dont need a parser for this...

Thanks for help and clarification!

score 1 · Answer 1 · answered Mar 31 '15 at 12:50

1

There is a difference between what a regular expression can match and what you need to parse a regular expression. Regular expressions can contain nested groups for instance, so you can't parse those with a regular expression. You have to ”count” nested pairs of parenthesis for example, which is outside the capabilities of a regular language.

See also: Is there a regular language to represent regular expressions.

answered Mar 31 '15 at 12:50

BlackJack

4,476
1
20
25

aah... so regular expressions themselves are not regular languages, only what they match is? So for checking, whether something is within a regular language, you use a FSM, but the regular expression itself has to be parsed? But still: I only need a parser for CF languages? For RL, a FSM is enough? – user3629892 Mar 31 '15 at 15:27
What is a ”parser” here? For instance the C function `scanf()` is used to parse string representations into different basic data types like numbers or strings. JavaScript has a `parseInt()` function to parse a string into a number. Both don't need more than a FSM-like implementation to parse the input. And yes, for parsing a RL a FSM is enough. – BlackJack Mar 31 '15 at 15:49
hmm okay.. I thought a "parser" meant that there is a syntax tree... and in the case of parseInt, e.g., it was called a scanner... – user3629892 Apr 08 '15 at 14:24

Trying to understand parsing and scanning (difference for reg. languages and cf languages)

1 Answers1