A DFA seems easier to make here, so we can start there and derive the regular expression from there.
We will need at least one, initial, state. This state cannot be accepting because the empty string does not start and end with b
. We'll call this q0
.
If we see an a
in this state, we are looking at a string that does not start with b
, so we cannot accept it no matter what comes next. We can represent this with a new, dead, state. We'll call this q1
.
If we see a b
in q0
, we need a new state to represent the fact that we are well on the way to seeing a string that meets the criteria. Indeed, the string b
starts and ends with a b
and has an even number of a
(zero is even); so this state must be accepting. Call this q2
.
If we see an a
in q2
then we have an odd number of a
s and did not last see a b
, so we cannot accept the string. However, it's still possible to accept a string from this state by seeing an odd number of a
s followed by at least one b
. Call the state to represent this q3
.
If we see a b
in q2
, we're in the same situation as before (even number of a
and last saw a b
, so we can accept). Stay in q2
.
If in q3
and we see an a
, we now have an even number of a
again and just need a b
. Call this new state q4
. If we see a b
, we still need an a
so we might as well stay in q3
.
If in q4
and we see an a
, we again need more a
s and can return to q3
. If, on the other hand, we get a b
, we can return to q2
since the string is in our language.
The DFA looks like this:
q s q'
-- -- --
q0 a q1 q0: initial state
q0 b q2 q1: dead state, did not begin with b
q1 a q1 q2: accepting state, even #a and start/stop with b
q1 b q2 q3: start with b, odd #a
q2 a q3 q4: start with b, even #a, stop with a
q2 b q2
q3 a q4
q3 b q3
q4 a q3
q4 b q2
To get the regular expression we can find regular expressions leading to each state, iteratively, and then take the union of regular expressions for accepting states. In this case, only q2
is accepting so all we need is a regular expression for that state. We proceed iteratively, substituting at each stage.
round 0
(q0): e
(q1): (q0)a + (q1)(a+b)
(q2): (q0)b + (q2)b + (q4)b
(q3): (q2)a + (q3)b + (q4)a
(q4): (q3)a
round 1
(q0): e
(q1): a + (q1)(a+b) = a(a+b)*
(q2): b + (q2)b + (q4)b = (b+(q4)b)b*
(q3): (q2)a + (q3)b + (q4)a = ((q2)+(q4))ab*
(q4): (q3)a
round 2
(q0): e
(q1): a(a+b)*
(q2): (b+(q3)ab)b*
(q3): ((q2)+(q3)a)ab* = (q2)ab* + (q3)aab* = (q2)ab*(aab*)*
(q4): (q3)a
round3:
(q0): e
(q1): a(a+b)*
(q2): (b+(q3)ab)b*
(q3): (b+(q3)ab)b*ab*(aab*)* = bb*ab*(aab*)*+(q3)abb*ab*(aab*)* = bb*ab*(aab*)*(abb*ab*(aab*)*)*
(q4): (q3)a
round4:
(q0): e
(q1): a(a+b)*
(q2): (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
(q3): bb*ab*(aab*)*(abb*ab*(aab*)*)*
(q4): bb*ab*(aab*)*(abb*ab*(aab*)*)*a
Therefore, a regular expression is this:
r = (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
= bb* + bb*ab*(aab*)*(abb*ab*(aab*)*)*abb*
- The
bb*
part encodes the fact that any string of b
is a string in the language.
- the other part begins and ends with
bb*
which encodes the fact that any string must begin and end with b
- The outermost
a
s encode the fact that any string in the language with a
must have both a first and a last a
- The
aab*
parts allow there to be contiguous pairs of a
- The
abb*ab*
part allow allows there to be non-contiguous pairs of a
As a final note, the rules for replacing expressions as above are the following:
A: r r is an expression
B: As s is an expression
=
A: r
B: rs
A: r + As r, s are expressions
=
A = rs*