Simplify this regular expression

Question

I'm doing some pre-exam exercises for my compilers class, and needed to simplify this regular expression.

(a U b)*(a U e)b* U (a U b)*(b U e)a*

Quite obviously, the e is the empty string, and the U stands for union.

So far, I think one of the (a U b)* can be removed, as the union of a U a = a. However, I can't find any other simplifications, and am not doing so well with the other problems thus far. :(

Any help is appreciated, thanks very much!

I would very much prefer explanations to any answers, or even hints rather than answers, I am doing pre-exam exercises, answers without explanations won't help me very much! Thanks! — CompilersBeginner, Feb 10 '11 at 01:58
I do believe my accepted answer is wrong, as pointed out in the comments. I do think that the result is indeed (a U b)* but my explanation is not correct. — Piva, Feb 10 '11 at 02:46

tobyodavies · Answer 1 · 2011-02-10T04:07:48.190

First translate to an english description of the language:

(a U b)*(a U e)b* U (a U b)*(b U e)a*

Translates to:

Any sequence of as or bs, followed by an optional a, followed by any number of bs.

OR

Any number of as and bs, followed by an optional b, follwed by any number of as

There is a lot of overlap here - at least (a U b)*(a U e) is exactly the same as (a U b)*, because "Any sequence of as and bs" necessarily either ends with an a or epsilon (as any string can end with epsilon) so those groups can be eliminated, leaving

(a U b)*b* U (a U b)*a*

Translates to:

Any sequence of as or bs, followed by any number of bs.

OR

Any number of as and bs, follwed by any number of as

Now the first section of those to outermost groups is the same, so lets collapse those into one

(a U b)*(a* U b*)

Translates to:

Any sequence of as or bs, followed by any number of as OR by any number bs.

now hold on a minute, "Any sequence of As and Bs" necessarily ends with "Any sequence of as OR any sequence of bs", which means anything which matches the first part can match the whole regex (because the second part can have a length of zero) so why don't we just make it

(a U b)*

Ta Da. Simple.

By your reasoning (a U b)*b would reduce the same way... but it doesn't. You have to make sure not only that all matches continue to match, but that all rejected inputs are still rejected, and your argument missed the latter. — Ben Voigt, Feb 10 '11 at 02:15
@Ben Which step loses information? A lot of the simplifications rely on all the trailing bits being able to have 0 length, which your example doesn't so i'm not sure to what you are referring. — tobyodavies, Feb 10 '11 at 03:58
@Ben, i _think_ you are referring to my 1st or 3rd steps where i say that because the 1st repetition _necessarily_ ends with the second we can remove the second, which again does not apply to your example - `(a U b)` does not _necessarily_ end with a `b` - it could end in `a` so the simplification does not apply. — tobyodavies, Feb 10 '11 at 04:34

score 1 · Accepted Answer · edited Feb 10 '11 at 02:12

1

Little rusty on regex, but if * still represents the "zero or more ocurrences" you can replace:

(a U e)b* for (a U b)*

which leaves the first part with:

(a U b)*(a U b)* = (a U b)*

On the right side, you have that

(b U e)a* = (b U a)*

Now, since a U b = b U a, you get:

(a U b)*(a U b)*

on the right hand side, which leaves just

(a U b)* U (a U b)* = (a U b)*

I think that's it...

edited Feb 10 '11 at 02:12

BoltClock

700,868
160
1,392
1,356

answered Feb 10 '11 at 01:59

Piva

982
6
13

Oh my God, my head hurts as my regex is also rusty. You could be right, but *what happened to the 'e' token*? I don't see how it gets eliminated. But it could be possible that I don't see it (again, age and lack of coffee combined with regex rustiness.) – luis.espinal Feb 10 '11 at 02:06
The first step is wrong, as the before only allows 1 `a` and only in the first position. – Ben Voigt Feb 10 '11 at 02:06
@BoltClock: There is no `a?` in the answer that I see. – Jeremiah Willcock Feb 10 '11 at 02:09
@ben-voigt I think that you are right. (a U e)b* = a*b* which is not equivalent to (a U b)* – Piva Feb 10 '11 at 02:34
Is it ok to edit the answer? I think the correct explanation is that (a U b)*(a U e)b* U (a U b)*(b U e)a* = (a U b)*a?b* U (a U b)*b?a* = (a U b)* U (a U b)* = (a U b)* where ? is 0 or 1 occurence. – Piva Feb 10 '11 at 02:49
@ piva, i think your 2nd last step in your comment should be `(a U b)*(a* U b*)` because `a* U b* != (a U b)*`. the final answer is right, but none of the working seems to be in the answer... The comment is better, can you edit since you've already got the tick... – tobyodavies Feb 10 '11 at 04:38
@Piva: No, `(a U e)b*` is `a?b*`. Zero or one `a`, not more. – Ben Voigt Feb 10 '11 at 06:03

score 0 · Answer 3 · answered Feb 10 '11 at 01:56

0

I think the whole thing is equivalent to (a U b)* (or in most regex grammars, (a|b)*)

answered Feb 10 '11 at 01:56

Ben Voigt

277,958
43
419
720

Since I'm doing practice problems for an exam, I'm much more interested in how you came to this conclusion. Could you share that? – CompilersBeginner Feb 10 '11 at 01:57
1

Here's my reasoning: look at the left branch of the top union. First, take `(a U b)*(a U e)`; distribute the concatenation through the union to get `(a U b)*a U (a U b)*`. The second part is a superset of the first one, so that collapses into `(a U b)*`. Adding the `b*` onto the end does the same thing: anything that is matched by `(a U b)*b*` will also be matched by `(a U b)*` and vice versa. That makes the RE into `(a U b)* U (a U b)*(b U e)a*`; since the right side can only accept strings of `a` and `b`, it is a subset of the left side, so the RE simplifies to just `(a U b)*`. – Jeremiah Willcock Feb 10 '11 at 02:02
1

@CompilersBeginner: Two forms are equivalent iff matching the first implies matching the second, and matching the second implies matching the first, right? Your expression never introduces any tokens except `a` and `b`, so anything that matches it also matches `(a U b)*`. And any `(a U b)*` matches yours by taking the first branch, `(a U b)*(a U e)b*`, then choosing the empty-string branch `(a U b)* e b*`, and finally choosing repetition count 0 for `b*`. – Ben Voigt Feb 10 '11 at 02:03

FabianB · Answer 4 · 2011-02-10T02:37:39.667

I´ll give you an idea of how I would solve it: (not very formal and no guarantee)

Look at the left side of the main U:

(a U b)* - What does it mean? A combination of a´s and b´s of length n, where n >= 0.

Next comes (a U e). What do we have here? An a or empty word. If we wanted that a we could just have gotten it in the previous part already. If we want the e, well we can leave it out anyway. Please note here that we dont have to take an a, because we have the option to chose e. So we can skip this whole part.

What is next? b*. What is that? As many b´s as we want. We could have gotten those in the first part also! we can leave that out!

So the only thing on the left is (a U b)*.

Lets have a look on the right side:

Ok this is easy now, we can use the same idea it is just different letters.

We will also get (a U b)* in the same way.

So in the end we have (a U b)* U (a U b)* which you know is equal to (a U b)*.

By your reasoning `(a U b)*b` would reduce the same way... but it doesn't. — Ben Voigt, Feb 10 '11 at 02:14
Thanks for the hint, I was trying to give a less formal approach in my answer. I am not sure how to improve the answer without giving that up. — FabianB, Feb 10 '11 at 02:41

Simplify this regular expression

4 Answers4

Linked