Don't make this harder than it needs to be. Your language has the following characteristics:
- Has an
=
in the middle
- The LHS starts and ends with
2
and has 2
s and *
s in the middle
- The RHS is just
2
s
- The numbers of
2
s on the LHS and RHS are equal
- The LHS does not contain
**
.
These rules are easy to put into a grammar:
(P1) S -> 2=2
(P2) S -> 2S2
(P3) S -> 2*S2
The first rule is our base case and establishes that =
must always separate the LHS
and RHS
. It also establishes that the LHS
must end with a 2
and that the RHS must start with a 2
.
The second and third rules allows us to add more 2
s to get longer strings in the language. The second rules says "you can always put a 2
on the front of the LHS, and if you do, you must put one on the end of the RHS". The third rule allows us to put *
into the LHS as long as we put at least one S on the RHS".
Your examples:
222*2*22=222222
S
2S2 P2
22S22 P2
222*S222 P3
222*2*S2222 P3
222*2*2S22222 P2
222*2*22=222222 P1
2*2*2=222
S
2*S2 P2
2*2*S22 P2
2*2*2=222 P3
2*222222*22=222222222
S
2*S2 P3
2*2S22 P2
2*22S222 P2
2*222S2222 P2
2*2222S22222 P2
2*22222S222222 P2
2*222222*S2222222 P3
2*222222*2S22222222 P2
2*222222*22=222222222 P1
A formal correctness proof for this grammar would involve showing that (a) every string in the language is generated and (2) every string generated is in the language. We can do both using induction:
Proof: By induction.
Base case: the shortest string in the language is 2=2
, generated by P1. There are no shorter generated strings.
Induction hypothesis: assume all strings of length less than k
are generated and in the language (the sets are the same up to length k
).
Induction step: we must show strings of length greater than k
are also in agreement. If we have a string of length k
or more in the language (alternatively, generated by the grammar), it must be of the form 22x22
or 2*x2
, where x
is another string the language (alternatively, generated by the grammar). Either the length of x
is less than k
or this argument applies recursively to x
itself. Since x
has length less than k
, the induction hypothesis implies it can be generated by the grammar (alternatively, that it is in the language); and both forms can be generated (alternatively, are in the language) as a result: by two applications of P2 and one application of P3 (alternatively, by the definition of the language itself).
UPDATE:
A comment brought to my attention that the number of *
is supposed to be fixed at 2. This requires a change in the definition of the grammar:
S -> 2S2 | 2*R2
R -> 2R2 | 2*T2
T -> 2T2 | 2=2
This changes the above arguments in relatively minor and predictable ways. Basically, we keep track of the number of applications of P3 and disallow further applications after the second, while simultaneously only allowing the elimination of all nonterminals after we have seen at least two applications.