How to convert a regular grammar to regular expression?

Question

Is there an algorithm or tool to convert regular grammar to regular expression?

You can look into http://www.regexmagic.com/ if easily creating the expression is your purpose. — Myrtle, Jan 17 '12 at 16:25
My goal is to convert regular grammer to DFA. Finally, I found an excellent tool : http://www.jflap.org/jflaptmp/ . — dalibocai, Jan 17 '12 at 16:47
@dalibocai: could you update your title '.. to DFA', answer it and mark as answer for better search results ? TY — Mat M, Apr 23 '14 at 11:37

Stephan · Accepted Answer · 2022-08-10T17:15:54.863

1

Answer from dalibocai:

My goal is to convert regular grammer to DFA. Finally, I found an excellent tool : JFLAP.

A tutorial is available here: https://www2.cs.duke.edu/csed/jflap/tutorial/framebody.html

edited Aug 10 '22 at 17:15

answered Jun 28 '13 at 08:39

Stephan

41,764
65
238
329

1

The link is broken. – Benedikt S. Vogler Aug 02 '22 at 11:47
@BenediktS.Vogler Link fixed. Thanks for your comment. – Stephan Aug 10 '22 at 17:16

score 1 · Answer 2 · answered Oct 08 '13 at 14:59

1

The algorithm is pretty straightforward if you can compute an automaton from your regular expression. Once you have your automaton. For instance for (aa*b|c), an automaton would be (arrows go to the right):

          a
         / \
      a  \ / b
-> 0 ---> 1 ---> 2 ->
    \___________/
          c

Then just "enumerate" your transitions as rules. Below, consider that 0, 1, and 2 are nonterminal symbols, and of course a, b and c are the tokens.

0: a1 | c2
1: a1 | b2
2: epsilon

or, if you don't want empty right-hand sides.

0: a1 | c
1: a1 | b

And of course, the route in the other direction provides one means to convert a regular grammar into an automaton, hence a rational expression.

answered Oct 08 '13 at 14:59

akim

8,255
3
44
60

Regular grammars can provide loops that cannot be easily translated backwards: `({A, B, C, D}, {a, b, c, d}, {A -> aB, B -> bC, C -> cA, C -> cD, D -> dB, D -> d}, A)`, where you have the loop `A -> B -> C -> A` and the loop `B -> C -> D -> A` that overlap. – CodenameLambda Nov 03 '19 at 18:48
@CodenameLambda I don't understand what you mean. It is straightforward to build an automaton from your grammar, and the state elimination method, or just any aut-to-exp method will give you a result. Loops, overlapping or not, are irrelevant. – akim Nov 05 '19 at 18:37
Your answer only gives a rough description for an algorithm from regex to regular grammar, which can be followed backwards most of the time, but not always. So while there are methods that can properly translate every regular grammar to a regular expression, reversing the process you described can't handle all regular grammars. – CodenameLambda Nov 17 '19 at 15:00

score 1 · Answer 3 · answered Jan 23 '21 at 14:50

From a theoretical point of view, an algorithm to solve this problem works by creating a regular expression from each rule in the grammar, and solving the resulting system of equations for the initial symbol.

For example, for regular grammar ({S,A},{a,b,c},P,S):

P:
   S -> aA | cS | a  | c
   A -> aA | a  | bS

Take each non-termimal symbol and generate regular expression from right hand:
```
S = aA + cS + a + c
A = aA + bS + c
```

Solve equation system for initial symbol S:

A = a(aA + bS + c) + bS + c
A = a⁺bS + a⁺c + bS + c  

S = aA + c(aA + cS + a + c)
S = aA + c⁺aA + c⁺a + c⁺

S = a(a⁺bS + a⁺c + bS + c) + c⁺a(a⁺bS + a⁺c + bS + c) + c⁺a + c⁺
S = a⁺bS + a⁺c + c⁺a⁺bS + c⁺a⁺c + c⁺a + c⁺

S = (c⁺ + ε)a⁺bS + a⁺c + c⁺(a⁺c + a + ε)

substitution: x = (c⁺ + ε)a⁺b

S = x(xS + a⁺c + c⁺(a⁺c + a + ε)) + a⁺c + c⁺(a⁺c + a + ε)
S = x⁺a⁺c + x⁺c⁺(a⁺c + a + ε) + a⁺c + c⁺(a⁺c + a + ε)
S = x*(a⁺c + c⁺(a⁺c + a + ε))

S = ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε))

Because all modifications were equivalent, ((c⁺ + ε)a⁺b)*(⁺a⁺c + c⁺(a⁺c + a + ε)) is a regular expression equivalent to all words which can be produced from the initial symbol. Thus the value of this expression must be equivalent to the language generated by the grammar whose initial symbol is S.

It ain't pretty, but i purposefully picked a grammar including cycles to portray the way the algorithm works. The hardest part is recognizing that S = xS | x is equivalent to S = x⁺, then just doing the substitutions.

score 0 · Answer 4 · answered Jan 10 '22 at 14:09

I'll leave this as an answer to this old question, in case that anybody finds it useful:

I have recently released a library for exactly that purpose:

https://github.com/rindPHI/grammar2regex

You can precisely convert regular grammars, but also compute approximate regular expressions for more general general context-free grammars. The output format can be configured to be a custom ADT type or the regular expression format of the z3 SMT solver (z3.ReRef).

Internally, the tool converts grammars to finite automata. If you're interested in the automaton itself, you can call the method right_linear_grammar_to_nfa.

How to convert a regular grammar to regular expression?

4 Answers4

Linked