6

I'm working on some homework for my compiler class and I have the following problem:

Write a regular expression for all strings of a's and b's that contain an odd number of a's or an odd number of b's (or both).

After a lot of whiteboard work I came up with the following solution:

(aa|bb)* (ab|ba|a|b) ((aa|bb)* (ab|ba) (aa|bb)* (ab|ba) (aa|bb)*)*

However, Is this is the most simplified I can get it? I've considered constructing the DFA trying to minimize the number of states there to see if it would help me simplify but I figured I would ask the regex gurus on SO first.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
  • What advanced features of regex are you allowed to use? – Brad Gilbert Sep 16 '09 at 19:30
  • 6
    he is using regular expressions in Computer Science, not PCRE or posix regex's ;) They are different. – Byron Whitlock Sep 16 '09 at 19:31
  • 1
    @Brad Gilbert, I assume we are only allowed to use the regex which has been introduced so far in the book which isn't much. (*, +, ?, |, [], ^). Pretty plain. –  Sep 16 '09 at 19:33
  • Brings back memories from when I graded such homework as a TA. Some of the most interesting stuff I've graded, by far. :) – Greg D Sep 16 '09 at 19:35
  • +1 for asking "why/how?" when the answer was posted. :) – Greg D Sep 16 '09 at 20:00
  • 1
    Thank you everyone! From the combined insight of Greg D, Walt W, and sepp2k we were able to come up with a valid answer. –  Sep 16 '09 at 20:14

4 Answers4

8

Take Greg D's recommendation of starting with a(aa)*, and going from there. Sepp2k almost has it right, but the real consideration is that you don't care about the other letter. What I mean is, when you are looking at the "odd number of a's" constraint, you don't care at all about what b's are in your string. Thus, stick b*'s anywhere you can :)

Sepp2k's answer is almost right, but this one is correct:

b* a b* (a b* a b* )* | a* b a* (b a* b a* )*

To elaborate, this regex figures out all strings with an odd number of a's (first section), and OR's those strings with any strings containing an odd number of b's.

Walt W
  • 3,261
  • 3
  • 30
  • 37
  • @Walt W, I'm running this one through its paces but I think you are correct. – mmcdole Sep 16 '09 at 20:05
  • please tell me the regular expression for any string that contain even number of a's and even number of b's ? – nadeem gc Sep 21 '13 at 07:00
  • Do you mean an even number of a's OR an even number of b's? I suppose you could do an AND with zero-length lookaheads... That's not standard regex stuff though. If you want to change this equation from odd to even, just drop the first two terms of each segment (b*a from the left side and a*b from the right side) – Walt W Sep 23 '13 at 23:35
5

This should work:

b* a b* (a b* a b*)* |  a* b a* (b a* b a*)*
sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • 3
    I was writing something similar :) To elaborate, this regex figures out all strings with an odd number of a's (first section), and OR's those strings with any strings containing an odd number of b's. There's a slight error here though, as first term needs b* at the end, and second option needs a* at the end. Otherwise, abbba will not be accepted. – Walt W Sep 16 '09 at 19:55
  • @sepp2k, this is working in all my test cases. Can you describe your thought process when you were making that? It is far simpler than the path I was going down. – mmcdole Sep 16 '09 at 19:55
  • Nobody said it can't be ambiguous. Walt is correct, it isn't finished, but all the important bits are there. :) – Greg D Sep 16 '09 at 19:57
  • @Walt W: You're absolutely right. Fixed. @Simucal: Walt explained it pretty well. – sepp2k Sep 16 '09 at 19:57
  • @sepp2k, yea, we submitted out comments at the same time. – mmcdole Sep 16 '09 at 19:58
  • @sepp2k: Err, I was a little wrong, you need the b* / a* at the end to be inside the parenthesis, or else you can't have b's between two groups of two a's (like aaabbaa). I posted an answer with that fix.. – Walt W Sep 16 '09 at 20:01
  • I think this might fail on: ABBBABB. – mmcdole Sep 16 '09 at 20:01
  • Simulcal: Refer to my last comment / answer. – Walt W Sep 16 '09 at 20:03
2

I'm afraid that I don't believe your regex as written is correct. Consider the string:

aba

We have a couple choices for matches, but the fact that it's odd-length means we must match a lone a at the front, so:

(a)(ba)

But, sadly, it's impossible for your second main grouping there to match (ba).

When dealing with a constraint like this, I found it easier to start from the core constraint and go from there. In this case, your constraint is "odd," so start with

a(aa)*

to force an odd number of a's and go from there. :)

Greg D
  • 43,259
  • 14
  • 84
  • 117
0

I think you need to approach the problem differently.

You are trying to match anything that doesn't have even numbers of both a and b.

It would probably be easier to start with something that matches even numbers of a and b. All you have to do at that point would be to add something on the end that matches the smallest string that you actually want to match.

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129