Construct a regular expression to match the following language

Question

I am working on a thinking exercise handed out by my professor at the end of a lecture. The problem is to construct a DFA given a specific language definition. Before I construct the DFA, the first thinking exercise is to convert the language definition into a regular expression.

The provided alphabet is binary {0, 1}

The language definition is quite informal:

The language defining the set of binary strings in which every sub-string of length 3 has as least one zero

So examples of strings that match this definition would be 000, 001, 1010 and so forth.

My trouble is coming up with a regular expression to match this language definition. I tried playing around on http://regexr.com/ but I only found that '..0' matches every three characters with a zero at the end. I'm not sure how to match every sub-string in the way the language is defined, or if it is even possible.

Is there a way to construct a regular expression for this problem?

score 3 · Accepted Answer · answered Oct 03 '16 at 06:43

3

Lateral thinking required. Don't implement the regex for the informal language definition, but for the property that that definition implies.

Spoiler (hover over it for the solution):

Hint 1:

If any arbitrary 3-length substring must have a 0-digit, then it is impossible to have 3 digits in a row that are 1-digits.

Hint 2:

This means that between every 0-digit there is at most 2 of 1-digits.

Hint 3:

This makes it a language where after 0-2 1-digits, there comes a possibly infinite amount of groups consisting of a 0-digit and 0-2 1-digits.

Solution:

^1{0,2}(01{0,2})*$, or equivalently and more mathematically, ^(11?)?(0(11?)?)*$

answered Oct 03 '16 at 06:43

Amadan

191,408
23
240
301

This is great, thank you. How could this regex be extended if the alphabet were now to contain the digit 2 but the informal language did not change? – JavascriptLoser Oct 03 '16 at 09:20
1

Reread hints, replace "`1`" with "`1` or `2`". Does anything stop making sense? (It's an assignment; the more you try yourself, the more you learn.) – Amadan Oct 03 '16 at 09:25
The logic of the hints make sense, but I'm clueless as to how to represent "`1` or `2`" as a regular expression pattern – JavascriptLoser Oct 03 '16 at 09:28
1

Typical regexp: `[12]`. More basically, `(1|2)`. – Amadan Oct 03 '16 at 09:31
I tried `^[1(1|2)]{0,2}(0[1(1|2)]{0,2})*$` and it seems to have done the trick! Just out of curiosity for regex, is there a way to match exactly one zero per 3 characters rather than at least one zero? – JavascriptLoser Oct 03 '16 at 09:42
Not quite. Those were to be either-or. Your regexp would also match `01000(00|1`, for instance. The correct one is `^[12]{0,2}(0[12]{0,2})*$` (more programmery), or `^((1|2)(1|2)?)?(0((1|2)(1|2)?)?)*$` (more automaton-theoristy). As to your other question, reread the hints, as I think you still haven't understood the solution. I am not matching at least one zero per three characters, I'm limiting the number of non-zeroes between two zeroes to at most two. It's splitting semantics, but in programming, you have to be a nitpick. – Amadan Oct 03 '16 at 09:46
If you want to have exactly one zero per any three characters, then zeroes have to be exactly every third position: `^1{0,2}(011)*01{0,2}$`. – Amadan Oct 03 '16 at 09:53

Construct a regular expression to match the following language

1 Answers1