5

The text book teaches us to write regular expressions using the epsilon (ε) symbol, but how can I translate that symbol directly to code without having to completely rework my regular expression?

For instance, how would I write this regex which would catch all lowercase strings that either begin or end in a (or both).

Not 100% sure this is correct but...

((a|epsilon)[a-z]*a) | (a[a-z]*(a|epsilon))

So some strings that should match include:

a //single "a" starts or ends with "a"

aa //starts and ends with "a"

ab //starts with "a"

ba //ends with "a"

aba //starts and ends with "a"

aaaaaaaa //starts and ends with "a"

abbbbbbb //starts with "a"

bbbbbbba //ends with "a"

abbbbbba //starts and ends with "a"

asdfhgdu //starts with "a"

onoineca //ends with "a"

ahnrtyna //starts and ends with "a"

I only what to exchange epsilon for the correct symbol, I do not want to modify any part of the rest of the expression. Also I want to be clear, I am not actually checking for the epsilon symbol, I want to have a choice of a character or nothing (well not nothing... epsilon).

Does such a symbol exist?

Is what I want possible?

ubiquibacon
  • 10,451
  • 28
  • 109
  • 179
  • 3
    If the character set (and language) supports it couldn't you just insert `ε` directly? What have you tried? What happened? – jensgram Sep 15 '10 at 09:22
  • @typoknig :) Posted answer adhering to the QA structure of SO. Ignore it if you please. – jensgram Sep 15 '10 at 09:33
  • @jensgram I think I might have been in error. I thought it was working at first but now I am not sure. Just to be clear, I am not check for the actual `epsilon` character. See my revised question for a better regex example of what I am trying to do. – ubiquibacon Sep 15 '10 at 09:36
  • @typoknig Ah, in that case my answer is probably no good (and it was down-voted). Will remove :) – jensgram Sep 15 '10 at 09:37
  • @jensgram thanks for your help anyways (I didn't down vote you) :) – ubiquibacon Sep 15 '10 at 09:41
  • @typoknig No problem. The answer was wrong anyway :) – jensgram Sep 15 '10 at 09:43
  • @typoknig Perhaps you could add some concrete examples of strings to match (and strings *not* to match)? I'm a little confused, to say the least. – jensgram Sep 15 '10 at 09:46

1 Answers1

11

Just omit the , since it denotes the empty string:

([1-9]|)[0-9]*

There’s also a shortcut for this particular case:

([1-9]?)[0-9]*

The ? means zero or one occurrences of the preceding token.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • thanks, the example I gave in my question originally was not a good one, please look at my revised question. Also, I am aware of `?`, but I was not sure if that was as close to an epsilon as I could get or not. – ubiquibacon Sep 15 '10 at 09:44
  • @typoknig: Once again, just omit the symbol! Your new example introduces nothing that invalidates this technique, and in fact this technique will **always** work. Apart from that, your expression can be made much simpler: `a[a-z]*|[a-z]*a` – Konrad Rudolph Sep 15 '10 at 09:56
  • thanks for your input, and I see now that the regex in my example could have been simpler. The reason I wanted to use `epsilon` instead of some other symbol is that `epsilon` is what is used in my text book, so when I am discussing these expressions with others in class I want to be on the same page... using the same methods and symbols. To that end I wanted the to use epsilon in a coded regex so I could check my work as I went along. – ubiquibacon Sep 15 '10 at 10:01
  • in the regex checker I use ( http://gskinner.com/RegExr/ ) an "empty string" after an "`|`" is not picked up, but that is not to say that it wouldn't work in some/most code. Thanks for your help. – ubiquibacon Sep 24 '10 at 13:17
  • @typoknig: I suspect that that’s a bug since all major regex engines do in fact support this (I checked …, `foo|` works everywhere while the gskinner RegExr marks it as an error). – Konrad Rudolph Sep 25 '10 at 14:37