How to exclude a token/expression from another token/expression in GnuWin32 Flex

Question

I want to exclude some keywords from my variable token My variable token is:

variable [a-z|A-Z]+[a-z|A-Z|0-9]*

and Keyword is:

Datatype "int"|"double"|"char"|"void"
KEYWORD "include"|"define"|{Datatype}|"return"|"if"|"else"|"elif"|"loop"|"while"|"run"|"new"

I tried to use {variable}^{KEYWORD} , ^{KEYWORD}{variable} but it's not working

I want to make variable token such a way that it cant generate anything from KEYWORD. How to do that..

You might find it useful to read through the short description of [Flex patterns](http://westes.github.io/flex/manual/Patterns.html) in which you will discover that `|` in `[a-z|A-Z]` only adds the character `|` to the set if possibilities, and that `^` means that the pattern should match at the beginning of a line (but only if it appears at the beginning of the pattern). You'll also find an example of a rule set which matches keywords and identifiers in the [examples section](http://westes.github.io/flex/manual/Simple-Examples.html). — rici, Sep 25 '19 at 23:58
But in geeksforgeeks i read [^A-Z] means all other characters except upper case letter .. — sabertooth, Sep 27 '19 at 04:27
that's what `^` means when it's the first symbol in a *character class*. Please read the actual documentation for Flex, which I linked. — rici, Sep 27 '19 at 04:53
Yes. If i could figure that out reading the doc i wouldn't have asked here. I solved it using a inner loop to check for keywords and was looking for a flex way to solve this. Anyway thanks for you help — sabertooth, Oct 02 '19 at 00:08
scroll down to the end of the linked page. There's a flex file. Look at the two lines near the top of the last example. One has a pattern consisting of keywords. The next has the pattern `{ID}`, which is like your `variable`. The order of the lines is important. The page on how flex matches explains why. — rici, Oct 02 '19 at 00:13
As you will see in other flex examples, it is more common for each keyword to have its own rule because the keywords are syntactically significant. (The grammar for different keywords is different.) But the simple example on that page is stunningly similar to your question, so it seemed like additional details were not necessary. — rici, Oct 02 '19 at 00:17
in that example first checked for keyword then key. But I wanted to make a token which will automatically exclude keyword from key. Cause a normal variable declaration can be like ``int x,a;`` but someone can write ``int a,int``; I wanted to make sure this wont fall in rule ``{datatype}{space}({variable},{0,1})*;`` — sabertooth, Oct 03 '19 at 04:32
`{datatype}{space}({variable},{0,1})*;` is not a token; it's a syntactically complex stream which needs to be parsed. Normally that's done with yacc/bison or some other parser generator. You will indeed find that flex's tokenising model does not help you much for recognising things which are more complicated than tokens. — rici, Oct 03 '19 at 04:48
By the way, we would usually say that `{xyz}` is a macro, not a token. A "token" is what is matched by an entire flex rule. — rici, Oct 03 '19 at 04:50
I mention that mostly to explain why your original question was not really understood. But the only real answer, now that I (think) I know what you mean, is that that is not a good way to parse your input, so you might want to think about a more traditional style. Flex does not implement regular expression difference or intersection operators, nor does it implement lookahead assertions (which is how many regex libraries allow difference/intersect matches), nor any other feature which would slow down the matching algorithm. — rici, Oct 03 '19 at 05:02

score -1 · Answer 1 · answered Oct 02 '19 at 00:14

I couldn't find a flex way to solve this . So i ran a function to get words in a string and then checked them with keywords for match.

void getKeyword(char *yytext){
        char temp[109];
        for(int i=0;i<strlen(yytext);i++){
            for(int j=i+1;j<=strlen(yytext);j++){
                if(yytext[j]=='\n' || yytext[j]==' ' || yytext[j]=='(' || yytext[j]==';'|| yytext[j]==','){
//Terminator
                    int id=0;
                    int k=i;
                    while(k<j && (yytext[k]==' '))k++; //removing back spaces
                    int l=j-1;
                    while(l>=k && (yytext[l]==' '))l--; // removing forward spaces
                    for(;k<=l;k++){
                        temp[id++]=yytext[k]; //storing the word
                    }
                    temp[id]='\0';

                    if(isKeyword(temp)){ //checker function
                        i=j-1;
                        //Saving it to an char array
                         memcpy(out[6][idx[6]++],temp,strlen(temp));
                        break;
                    }
                }
            }
        }
    }

iskeyword function ::

int isKeyword(char *c){
        if(!strcmp("return",c) || !strcmp("include",c) || !strcmp("define",c) || !strcmp("int",c)|| !strcmp("double",c)|| !strcmp("char",c)|| !strcmp("void",c)|| !strcmp("if",c)|| !strcmp("elif",c)|| !strcmp("else",c)|| !strcmp("loop",c)|| !strcmp("while",c) )
            return 1;
        return 0;
    }

This is quite wrong. Rici showed your exactly how to do this in flex in his comment. It is the documentation. — Brian Tompsett - 汤莱恩, Oct 02 '19 at 07:47
I wanted to make a token which will automatically exclude keyword. But in that example first checked for keyword and then id that doesn't solve my problem too. — sabertooth, Oct 03 '19 at 04:35

How to exclude a token/expression from another token/expression in GnuWin32 Flex

1 Answers1