Context-free grammar for L = {2^x ∗ 2^y ∗ 2^z = 2^(x+y+z) | x, y, z > 0}

Question

The title explains it. I have trouble with synchronizing the left side of the "equation" with the right one, as whenever I generate a 2 on the left, one on the right has to appear. Could it be that this language isn't context-free? Thanks in advance!

L = {2^x ∗ 2^y ∗ 2^z = 2^(x+y+z) | x, y, z > 0}

Edit: This has NOTHING to do with mathematical equations. The "*" and "=" are merely symbols of the alphabet of the language and 2 to the power of "x" implies that 2 is being repeated x-times.

Example of this language:
222*2*22=222222 
2*2*2=222
2*222222*22=222222222

This is unclear - are you trying to identify a grammar that accepts only *valid* mathematical equations? — Oliver Charlesworth, Apr 30 '17 at 16:18
It isn't clear what you want. Normally, when you use set-builder notation, what goes to the left side of “|” is an expression, not a proposition (unless you want the elements of your set to be themselves truth values). — isekaijin, Apr 30 '17 at 16:18
Also, in formal language theory, normally the multiplication and exponentiation operators are overloaded to denote string concatenation and repetition. When you say “2^x”, do you mean raising the number 2 to the x-th power, or repeating the string “2”, “x times. — isekaijin, Apr 30 '17 at 16:21
These aren't mathematical equations. I have an alphabet { 2, =, * } and this language include words such as: 2*2*2=222 or 22*2222*222 = 222222222 — Nanuna, Apr 30 '17 at 16:22
Ah, okay! I'd suggest you to include that bit of information in your question. — isekaijin, Apr 30 '17 at 16:23
Yes, those examples would have made it a lot clearer what you're referring to here :) — Oliver Charlesworth, Apr 30 '17 at 16:23
Your language is obviously context-free. Just think about the automaton that recognizes this language: Read x “2”s and push them into the stack. Read an asterisk. Read y “2”s and push them into the stack. Read an asterisk. Read z “2”s and push them into the stack. Read an “=”. Read a “2” for every element you can pop off of the stack. — isekaijin, Apr 30 '17 at 16:36

isekaijin · Answer 1 · 2017-04-30T18:44:43.143

1

Using basic facts about ~~multiplication~~ string concatenation and ~~exponentiation~~ repetition, we can redefine your language as:

L = { 2^x * 2^y * 2^z = 2^z 2^y 2^x | x, y, z > 0 }

This language definition can be further elaborated as:

Lz = { 2^z = 2^z | z > 0 }
Ly = { 2^y * w 2^y | y > 0, w ∈ Lz }
Lx = { 2^x * w 2^x | x > 0, w ∈ Ly }
L = Lx

Then we can define a grammar for Lz:

Z   ::=   2 = 2
Z   ::=   2 Z 2

And one for Ly:

{ include Lz's grammar }
Y   ::=   2 * Z 2
Y   ::=   2 Y 2

And one for Lx:

{ include Ly's grammar }
X   ::=   2 * Y 2
X   ::=   2 X 2

Since L = Lx, the combined grammar's start symbol is X:

edited Apr 30 '17 at 18:44

answered Apr 30 '17 at 16:33

isekaijin

19,076
18
85
153

This answer would be improved by including more prose to explain the purpose of each set of rules and either including a reference to the notation used or converting it to a widely used notation (e.g. yacc syntax). – Aaron Golden Apr 30 '17 at 17:31

Patrick87 · Answer 2 · 2017-05-01T14:55:29.710

Don't make this harder than it needs to be. Your language has the following characteristics:

Has an = in the middle
The LHS starts and ends with 2 and has 2s and *s in the middle
The RHS is just 2s
The numbers of 2s on the LHS and RHS are equal
The LHS does not contain **.

These rules are easy to put into a grammar:

(P1) S -> 2=2
(P2) S -> 2S2
(P3) S -> 2*S2

The first rule is our base case and establishes that = must always separate the LHS and RHS. It also establishes that the LHS must end with a 2 and that the RHS must start with a 2.

The second and third rules allows us to add more 2s to get longer strings in the language. The second rules says "you can always put a 2 on the front of the LHS, and if you do, you must put one on the end of the RHS". The third rule allows us to put * into the LHS as long as we put at least one S on the RHS".

Your examples:

222*2*22=222222 
S
2S2                    P2
22S22                  P2
222*S222               P3
222*2*S2222            P3
222*2*2S22222          P2
222*2*22=222222        P1

2*2*2=222
S                 
2*S2                   P2
2*2*S22                P2
2*2*2=222              P3

2*222222*22=222222222
S
2*S2                   P3 
2*2S22                 P2
2*22S222               P2
2*222S2222             P2
2*2222S22222           P2
2*22222S222222         P2
2*222222*S2222222      P3
2*222222*2S22222222    P2
2*222222*22=222222222  P1

A formal correctness proof for this grammar would involve showing that (a) every string in the language is generated and (2) every string generated is in the language. We can do both using induction:

Proof: By induction. Base case: the shortest string in the language is 2=2, generated by P1. There are no shorter generated strings. Induction hypothesis: assume all strings of length less than k are generated and in the language (the sets are the same up to length k). Induction step: we must show strings of length greater than k are also in agreement. If we have a string of length k or more in the language (alternatively, generated by the grammar), it must be of the form 22x22 or 2*x2, where x is another string the language (alternatively, generated by the grammar). Either the length of x is less than k or this argument applies recursively to x itself. Since x has length less than k, the induction hypothesis implies it can be generated by the grammar (alternatively, that it is in the language); and both forms can be generated (alternatively, are in the language) as a result: by two applications of P2 and one application of P3 (alternatively, by the definition of the language itself).

UPDATE:

A comment brought to my attention that the number of * is supposed to be fixed at 2. This requires a change in the definition of the grammar:

S -> 2S2 | 2*R2
R -> 2R2 | 2*T2
T -> 2T2 | 2=2

This changes the above arguments in relatively minor and predictable ways. Basically, we keep track of the number of applications of P3 and disallow further applications after the second, while simultaneously only allowing the elimination of all nonterminals after we have seen at least two applications.

I'm pretty sure the smallest string in his language should be `2*2*2=222`. — isekaijin, May 01 '17 at 14:21
@pyon Ah, good catch - I missed the part about the number of `*` being fixed at 2. I will update the answer. — Patrick87, May 01 '17 at 14:52

Aaron Golden · Answer 3 · 2017-05-02T03:50:26.673

-1

Here is an example of a context free grammar in which these strings like 222*2*22=222222, 2*2*2=222, etc. are grammatical.

<literal>: 2 | <literal>2;
<number>: <literal> | <number>*<number>;
<expression>: <number>=<number>;

With these productions the strings you want are all valid <expression>s. Strings that are "wrong" are also grammatical, like:

22=2

It isn't clear to me from your question whether or not that's a problem. I can imagine that the first order of business in your project is to parse strings without worrying about semantics, and then to evaluate the semantics of grammatical strings.

Edit: I'm curious to know if there's actually something wrong with my answer or if I've just received a "stay off my turf" downvote.

edited May 02 '17 at 03:50

answered Apr 30 '17 at 17:01

Aaron Golden

7,092
1
25
31

Presumably the OP is taking a course on formal languages (and possibly automata as well). The language he defined in his question isn't intended to serve any practical purpose. In particular, it doesn't have any semantics. The point to the exercise is just to practice the skill of coming up with a context-free grammar for a context-free language. – isekaijin Apr 30 '17 at 17:15
@pyon That's the impression I get as well, but I was thinking there might be some other context to the exercise. Like maybe the challenge is to set things up so that only semantically valid strings are grammatical (in which case I don't know how to answer that off the top of my head). – Aaron Golden Apr 30 '17 at 17:25

Context-free grammar for L = {2^x ∗ 2^y ∗ 2^z = 2^(x+y+z) | x, y, z > 0}

3 Answers3