Is there any method of generating arbitrary equivalent regular expressions?

Question

I want to write tests for a regular expression analysis engine. It would be nice if I could generate arbitrary pairs of equivalent regular expressions, to see whether the engine correctly parses them and identifies them as being equivalent. Is there any known algorithm for doing so?

I would also accept a list of 20-100 well-known regex equivalences, if anyone knows of a pre-created list. For example a*a and aa* or (ab)*a and a(ba)*.

The contents of the groups defined by `(` and `)` will be different, even if both patterns match the same text. — Donut, Nov 16 '21 at 20:07
@Donut by equivalent I mean both regexes define the same regular language. — ahelwer, Nov 16 '21 at 20:12
Ah, I understand. Interesting question... I am curious to see if anyone else has any insight. — Donut, Nov 16 '21 at 20:17

score 2 · Accepted Answer · answered Nov 17 '21 at 16:57

2

The method I came up with was as follows - I assembled a list of simple regex transformations which preserved equivalence, for example (assuming a and b are equivalent):

f(a, b) ⩴ (a*a, bb*)
f(a, b) ⩴ (aa?, b?b)
f(a, b) ⩴ (ab, ba)
f(a, b) ⩴ (a[\d]+, b[0-9]+)

etc. Then I randomly & iteratively applied these transformations to a known-equal pair of starting regexes, for example (x, x). The end result is a pair of complicated but equivalent regexes. This generation algorithm is suitable for use in property-based testing.

answered Nov 17 '21 at 16:57

ahelwer

1,441
13
29

I find testing regex engines a very interesting topic, because it tackles some of the fundamental questions in property-based testing, like effective test-case generation, oracle problem etc. I'd like to see (and discuss) the other property tests you have; is the code open-source? – johanneslink Nov 18 '21 at 07:09
Another comment: Does the implementation of the analysis engine also use the same set of transformations to decide on equivalence? If so, you might have fallen prey to the "tautology trap" of automated testing. – johanneslink Nov 18 '21 at 07:14

Is there any method of generating arbitrary equivalent regular expressions?

1 Answers1