1

I want to construct a regular expression (in the style of lex, with a more OCaml-like syntax) for a class of strings, where 4 characters [, ], #, ' are allowed only if they are preceded by an escape character '.

Here are some valid examples:

  • '#Data, abc'#Headers, abc'#Totals'[efg, 123'#Totals']efg, abc, 123

Here are some non-valid examples:

  • #Data, abc#Headers, abc#Totals[efg,123#Totals]efg, '#Totals[efg

Hope the definition is clear. First, does anyone know how to construct such a regular expression? Second, does anyone know how to construct such a regular expression (in the style of lex, with a more OCaml-like syntax) that can be accepted by ocamllex?

SoftTimur
  • 5,630
  • 38
  • 140
  • 292
  • Hi, interesting, perhaps a regex that looks for an optional escape char followed by anything might work `^[^\[\]#']*('.*)?$` – IronMan Aug 17 '20 at 23:59
  • @IronMan Thank you for your comment, I tested your proposition in https://www.regextester.com/, it seems that it considers `'#Totals[efg` valid. However, `'#Totals[efg` is not valid for me because `[` is not preceded by `'`. – SoftTimur Aug 18 '20 at 00:19
  • Perhaps one or more non-special chars followed by an optional escape sequence `^([^\[\]#']*('[\[\]#'])*)*$` – IronMan Aug 18 '20 at 00:49
  • The lex regex is `([^][#']|'[][#'])*`. (Or change `*` to `+` if you don't allow empty strings.) But I gather that the "more OCaml-lilke syntax" requires escaping some of those characters, at least the apostrophe. – rici Aug 18 '20 at 00:55

1 Answers1

0

You don't say the accepted strings look like other than with a few examples. Just for concreteness, let's say that lower-case letters and digits are allowed, and the 4 special characters are allowed only if preceded by '.

This, then, is described by the Kleene closure of a set of 36 one-character strings and 4 two-character strings.

Which looks like this:

 (['a' - 'z' '0' - '9'] | '\'' ['\'' '#' '[' ']'])*
Jeffrey Scofield
  • 65,646
  • 2
  • 72
  • 108