What's the regular expression for an alphabet without the first occurrence of a letter?

Question

I am trying to use FLEX to recognize some regular expressions that I need. What I am looking for is given a set of characters, say [A-Z], I want a regular expression that can match the first letter no matter what it is, followed by a second letter that can be anything in [A-Z] besides the first letter.

For example, if I give you AB, you match it but if I give you AA you don't. So I am kind of looking for a regex that's something like [A-Z][A-Z^Besides what was picked in the first set].

How could this be implemented for more occurrences of letters? Say if I want to match 3 letters without each new letter being anything from the previous ones. For instance ABC but not AAB.

Thank you!

rici · Accepted Answer · 2021-01-03T19:58:35.697

5

(Mathematical) regular expressions have no context. In (f)lex -- where regular expressions are actually regular, unlike most regex libraries -- there is no such thing as a back-reference, positive or negative.

So the only way to accomplish your goal with flex patterns is to enumerate the possibilities, which is tedious for two letters and impractical for more. The two letter case would be something like (abbreviated);

A[B-Z]|B[AC-Z]|C[ABD-Z]|D[A-CE-Z]|…|Z[A-Y]

The inverse expression also has 26 cases but is easier to type (and read). You could use (f)lex's first-longest-match rule to make use of it:

AA|BB|CC|DD|…|ZZ    { /* Two identical letters */ }
[[:upper:]]{2}  { /* This is the match */ }

Probably, neither of those is the best solution. However, I don't think I can give better advice without knowing more specifics. The key is knowing what action you want to take if the letters do match, which you don't specify. And what the other patterns are. (Recall that a lexical scanner is intended to divide the input into tokens, although you are free to ignore a token once it is identified.)

Flex does come with a number of useful features which can be used for more flexible token handling, including yyless (to rescan part or all of the token), yymore (to combine the match with the next token), and unput (to insert a character into the input stream). There is also REJECT, but you should try other solutions first. See the flex manual chapter on actions for more details.

So the simplest solution might be to just match any two capital letters, and then in the action check whether or not they are the same.

edited Jan 03 '21 at 19:58

answered Jan 03 '21 at 15:40

rici

234,347
28
237
341

Just to give a you a little bit more context. The problem is that I have a set from [A-F] and I need to match the input of geometrical shapes. For example I need to match "point A' but not 'point AB", then I need to match line "CB" but not line "AA" and so on with a triangle up to an octagon. So basically for each rule, I need to match input that has the exact length of letters for each shape and no second occurrence of the same later.. So yeah, what I am trying to do is match the input that has the exact amount of letters and then compare each letter to not be any of the previous ones. – Toxicone 7 Jan 04 '21 at 17:33
@HeyYoubooo: and what do you want to do if there us a repeated letter? Once again, the goal of a lexical scanner is to split the input into a sequence of tokens. Since every input is possible, the scanner must handle every input. How will it handle `AA`? If it just needs to produce an error, you can check for repeated letters after tokenisation. The patterns 9nly need to distinguish if it changes the way the input is split into tokens. Also: clarifications should be made by editing the question. Not everyone reads comments. – rici Jan 04 '21 at 21:27
Alright, will do this. Can you provide me with a link to check for the letter repetition after the tokenization? You said something about some functions that the lex uses. I am kinda new to this and thus don't know much. What I would do is if I find the input that's correct I would then use some c to check for each later and then output a message if there's no same later given. – Toxicone 7 Jan 04 '21 at 22:36
@HeyYoubooo: I did give a link for the flex actions, but I still don't really know what it is you are trying to do. Checking for repetitions in a character string is a common C exercise :-) In the case where there are two characters, it's trivial: `if (yytext[0] == yytext[1]) { /* the token has two identical characters */ }`. For longer strings, you can search SO for ideas. – rici Jan 04 '21 at 22:42
Eg: https://stackoverflow.com/questions/62723673/how-to-check-for-repeated-characters-within-a-string-in-c – rici Jan 04 '21 at 22:48
It's for the compilers course, described the problem before. I need the user to input say Triangle ABC which is correct and then I will output Accepted and if I get as input Triangle AAA or Triangle A I will reject it. I need to do all of this in a flex program. So what I have in mind is to check for the correct length for each shape and then iterate though all the letters to see if I there's a repetition. What I didn't know is how to get the input from the flex into a c variable but since you said yytext[index] it should be fairly easy. Thank you for the help man! <3 – Toxicone 7 Jan 04 '21 at 23:15
@HeyYoubooo: What "problem before"? (Just out of curiosity, since it seems you've identified the question you intended to ask. :-) ) – rici Jan 05 '21 at 01:49
Haha, the main problem is to identify expressions that are say "Triangle ABC" but not "Triangle ABB", I needed a regex that identified "triangle XXX" and not with less or more than 3 Xs. What I did to solve the thing is to identify expressions that have the correct length of Xs and then, using some C code and yytext[] ,that you suggested, checked for unique characters. I had no idea that yytext[] existed and wanted a regex that could identify a string that no XXX was the same letter than the others. Thank you again for your help – Toxicone 7 Jan 05 '21 at 12:44

What's the regular expression for an alphabet without the first occurrence of a letter?

1 Answers1