There is no way of doing this with either a regular expression or a CFG other than enumerating all the possibilities, which is a factorial amount of work. (By grouping, the actual size can be reduced, but it is still exponential.) If you only have one instance, and there are only three tokens, then listing is probably the easiest solution.
But if there are various tokens, and you might want to expand the list in the future, it's probably easier to allow all combination of tokens, but associate a bitmap with the token-list to make it easy to check for duplication, which presumably should result in an error message.
Here's a simple flex
solution for the precise case you mention. (In my original, I duplicated a lot of code, but I think the following is easier to read.) The <MODS>
start condition is triggered by the first appearance of [&^!]
and serves to absorb the rest of them; when any other character is encountered, it is marked to be rescanned (yyless(0)
) and the current mask of modifiers is returned.
%{
// The MODS token has %type <ModMask>, and the associated
// semantic value will be the union of the enum bits.
typedef unsigned int ModMask;
enum { MOD_HAT=1, MOD_BANG=2, MOD_AMP=4 };
// This function maps a character to a modmask value.
// A real implementation would use a lookup table. I just included
// this definition so the snippet is usable.
ModMask tokenToMask(unsigned char c) {
return c == '^' ? MOD_HAT :
c == '!' ? MOD_BANG :
c == '&' ? MOD_AMP : 0;
%}
%x SC_MODS
%%
[&^!] { yylval.modmask = tokenToMask(yytext[0]; BEGIN(MODS); }
<MODS>[&^!] { ModMask m = tokenToMask(yytext[0];
if (yylval.modmask & m) {
yyerror("Duplicate modifier");
}
yylval.modmask |= m;
}
<MODS>.|\n { yyless(0); BEGIN(INITIAL); return MODS; }