0

I am writing a lex program. I have initialized 3 char pointers. And then I am defining them to tokens if they satisfy the criteria. But when I print them afterwards, the first prints value of all 3, second of last two and last of itself. Why is this happenning? Here is my code:

%{
    #include<stdio.h>
    #include<string.h>

    int for_cond = 0;
    char *cond1, *cond2, *cond3;
    char * for_body = "";
    //char * loop = "";
    %}
    VAR [a-zA-Z_]+[a-zA-Z0-9_]*
    %%
    for[ ]*\( {for_cond++;}
    int[ ]+{VAR}[ ]*\=[ ]*[0-9]+ {if(for_cond==1){cond1 = yytext;}else if(for_cond==4){for_body = strcat(for_body,yytext);}}
    ; {if(for_cond==1||for_cond==2){for_cond++;} else if(for_cond==4){for_body = strcat(for_body,yytext);}}
    {VAR}[ ]*(\<|\>|\<\=|\>\=|\=\=)[ ]*[0-9]+ {if(for_cond==2){cond2 = yytext;}else if(for_cond==4){for_body = strcat(for_body,yytext);}}
    {VAR}[ ]*((\+\+|\-\-)|((\+\=|\-\=|\*\=|\/\=)[ ]*({VAR}|[0-9]+))) {if(for_cond==3){cond3 = yytext;}else if(for_cond==4){for_body = strcat(for_body,yytext);}}
    %%
    int yywrap(void){}
    int main(){

        yylex();
        printf("cond1 = %s\ncond2 = %s\ncond3 = %s\n", cond1, cond2, cond3);

        return 0;
    }

example input:

for(int i=0;i<=2;i++)

expected output:

cond1 = int i=0

cond2 = i<=2

cond3 = i++

What I am getting:

cond1 = int i=0;i<=2;i++)

cond2 = i<=2;i++)

cond3 = i++)

Why is this happenning? How do I fix this?

Shantanu Shinde
  • 932
  • 3
  • 23
  • 48

1 Answers1

1

yytext points to a temporary data structure internal to the lexer whose lifetime is limited to the lexer action. (More precisely, its lifetime starts with the lexer action and ends just prior to the lexical scan.)

In other words, you cannot save the value of yytext as a pointer. You must copy the contents of the string pointed to by yytext if you want to preserve the string for later use. If you have strdup, you can use it to create a copy of the string (but don't forget to free() the copy when you no longer need it.) If you don't have strdup, or for whatever reason don't want to use it, you can dynamically allocate space yourself:

char* theToken = malloc(yyleng + 1); strcpy(theToken, yytext);

It's also worth mentioning that given that for_body points to an immutable string of length 0 (char * for_body = "";), attempting to add text to it is Undefined Behaviour:

strcat(for_body,yytext);

On many platforms, that call will segfault because for_body's initialization leaves it pointing to read-only memory. But if the memory is writable, it is not part of the array pointed to by for_body, which means that it is part of some other object whose value will be destroyed by the call to strcat.

rici
  • 234,347
  • 28
  • 237
  • 341
  • so basically I am doing a deep copy of `yytext` when I should be doing a shallow copy, right? also, if the method I am using to update `for_body` is not correct, then what is the correct way? I want to dynamically keep adding text to it, similar to how we can do using `+` for C++ strings – Shantanu Shinde Feb 20 '20 at 00:21
  • 1
    You are not copying `yytext` at all. You should be copying it. C doesn't have strings and there is no way to grow a C array. All you can do is allocate a bigger array and copy. Nothing stops you from using C++ with Flex; the generated code can be compiled as C++ and it will work fine. You might find that a more comfortable solution. If you want to use C, you'll need to read up on dynamic allocation. – rici Feb 20 '20 at 04:12
  • so what does `cond1 = yytext` do exactly? copy the memory location of the `char` array pointed by `yytext` to `cond1`? – Shantanu Shinde Feb 20 '20 at 04:31
  • 1
    @ShantanuShinde: a pointer is a primitive value, so yes; afterwards, `cond1` and `yytext` point to the same place. (Which is somewhere in the middle of flex's internal buffer.) No characters are copied. That's quite different from `std::str cond1 = yytext;`, which would copy characters. – rici Feb 20 '20 at 05:03