1

I'm trying to debug why my variable mystring is not known when I think it should be according to an earlier question

Is the bug in the grammar or in the code?

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/dac/ClionProjects/openshell/openshell 
'PATH' is set to /home/dac/proj/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games.
> echo 'a b'
lexcode 3 Text echo mystring (null)
Becho
testlexcode 4 Text ' mystring (null)
lexcode 1 Text 
 mystring (null)
argument ::= ARGUMENT .
argumentList ::= argument .
command ::= FILENAME argumentList .
commandList ::= command .
 {(null)} {echo} {(null)}

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402308 in main ()
(gdb) 

My grammar is

%{
    #include "shellparser.h"
    #include <string.h>
    char *mystring;
%}

%option reentrant
%option noyywrap

%x SINGLE_QUOTED
%x DOUBLE_QUOTED

%%

"|"                     { return PIPE; }

[ \t\r]                 { }
[\n]                    { return EOL; }

[a-zA-Z0-9_\.\-]+       { return FILENAME; }

[']                     { BEGIN(SINGLE_QUOTED); }
<SINGLE_QUOTED>[^']+    { printf("test");mystring = strdup(yytext); }

<SINGLE_QUOTED>[']      { BEGIN(INITIAL);
      /*  mystring contains the whole string now,
           yytext contains only "'" */
                          return ARGUMENT; }
<SINGLE_QUOTED><<EOF>>  { return -1; }

["]                     { BEGIN(DOUBLE_QUOTED); }
<DOUBLE_QUOTED>[^"]+    { }
<DOUBLE_QUOTED>["]      { BEGIN(INITIAL); return ARGUMENT; }
<DOUBLE_QUOTED><<EOF>>  { return -1; }

[^ \t\r\n|'"]+          { return ARGUMENT; }

%%

Then my main loop is

yylex_init(&scanner);
yyset_in(stdin, scanner);

shellParser = ParseAlloc(malloc);

params[0] = NULL;
printf("> ");
i=1;
do {
    lexCode = yylex(scanner);
    text = strdup(yyget_text(scanner));
    printf("lexcode %i Text %s mystring %s\n", lexCode, text, mystring);
    if (lexCode == 4) {
        params[i++] = mystring;
        if (strcmp(text, "\'\0")) {
            params[i++] = mystring;
        }
    } else
    if (lexCode != EOL) {
        params[i++] = text;
        printf("B%s\n", text);
    }
    Parse(shellParser, lexCode, text);
    if (lexCode == EOL) {
        dump_argv("Before exec_arguments", i, params);
        exec_arguments(i, params);
        corpse_collector();
        Parse(shellParser, 0, NULL);
        i=1;
    }
} while (lexCode > 0);
if (-1 == lexCode) {
    fprintf(stderr, "The scanner encountered an error.\n");
}
yylex_destroy(scanner);
ParseFree(shellParser, free);

Why is mystring null when I expect it to be something? I get a segmentation fault:

$ ./openshell 
'PATH' is set to /home/dac/proj/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games.
> echo 'a b'
lexcode 3 Text echo mystring (null)
Becho
testlexcode 4 Text ' mystring (null)
lexcode 1 Text 
 mystring (null)
argument ::= ARGUMENT .
argumentList ::= argument .
command ::= FILENAME argumentList .
commandList ::= command .
 {(null)} {echo} {(null)}
Segmentation fault (core dumped)

The whole project is on my github.

Community
  • 1
  • 1
Niklas Rosencrantz
  • 25,640
  • 75
  • 229
  • 424
  • 1
    "`mystring` is not known" seems a poor description of the problem. The variable is certainly known in the sense that a declaration is in scope everywhere that variable is referenced, else your code would not compile. If the problem is with `mystring` at all, then surely it's that its value is not what you expected. – John Bollinger Apr 20 '16 at 18:35
  • But why is this grammar not executed? `[^']+ { printf("test");mystring = strdup(yytext); }` I thought that it should write out the test msg and write to `mystring`. – Niklas Rosencrantz Apr 20 '16 at 18:44
  • @PaulOgilvie I write to the interpreter `echo 'a b' ` and I get a segfault. I wanted it to print `test' and write to the `mystring` variable. – Niklas Rosencrantz Apr 20 '16 at 19:14
  • If it reaches that rule, then at least you will see "test" appearing from `printf("test");` If you don't see that, it didn't reach that rule. Of course, it first sees `echo`, for which I see no rule. – Paul Ogilvie Apr 20 '16 at 19:18
  • `echo` results in lexcode 3. It looks like `echo` gets interpreted correctly. Thing is that `echo foo` works but not `echo 'foo bar' ` – Niklas Rosencrantz Apr 20 '16 at 19:21

2 Answers2

2

Because in

    lexCode = yylex(scanner);
    text = strdup(yyget_text(scanner));
    printf("lexcode %i Text %s mystring %s\n", lexCode, text, mystring);

mystring has not necessarilly been set by yylex. There is only one rule that sets it, so generally it will (still) be NULL, causing the segfault.

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • But it looks like I set the string in the grammar: `[^']+ { printf("test");mystring = strdup(yytext); }` but then it doesn't even print the test msg. – Niklas Rosencrantz Apr 20 '16 at 19:09
  • 1
    So you know the lexer never gets to that rule.... The lexer returns some other token it has seen, and then it doesn't set `mystring`. – Paul Ogilvie Apr 20 '16 at 19:12
1

@PaulOgilvie explained pretty clearly why mystring might still be NULL after yylex() returns. As an otherwise-uninitialized global, mystring's initial value is NULL. After scanning the text "echo" from your example command, yylex() returns, not having set mystring, so at that point it is still NULL.

Do note, however, that that does not appear to be the proximal cause of your segfault. Your output shows the program proceeding past that point and in fact executing the printf() call immediately preceding the assignment to mystring. That strdup() with which you compute a value for mystring is a problem, however, because although yytext is a pointer to the start of the text of the token, it is not a pointer to a C string containing the token. Rather, it is a pointer to the location of the text in flex's buffer, and that text is not, generally, terminated at the end of the token.

flex provides global variable yyleng to tell you how long the text is, and you can use that to make a copy. For example, you could do this:

mystring = strndup(yytext, yyleng);

With that said, your output seems to show the scanning proceeding to completion (receiving token EOL, with value 1), in which case the crash is probably occurring in dump_argv() or even after. From the output, I'm guessing that you have a wild pointer or perhaps a pointer to an unterminated string somewhere in there. It's hard to tell, because you do not present the code for those functions.

Update: You do, still, seem to have mystring in the main loop not seeing the assignment performed by your scanner. The only plausible explanation for this is that they are not the same mystring. Perhaps you are declaring a static or local mystring in the scope of the main loop you presented. Note, too, that using flex's %reentrant option is intended to produce a scanner that avoids communicating via global variables, but you defeat that by introducing your own (mystring).

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • It's not working. I updated the question with a link to all the code which I keep in github. I don't understand why it doesn't work. – Niklas Rosencrantz Apr 20 '16 at 19:36
  • @Programmer400, sorry, it doesn't work that way here. We generally insist on the code to which the question pertains appearing in the question itself. Moreover, your chances of getting a useful answer are greatly improved if you present a [mcve], which, really, we should have asked for from the beginning. Paul's is the best answer to the question as originally posed. If that turns out not to have been the one you really needed to ask, then perhaps you should accept Paul's answer and ask a new question -- with a *bona fide* MCVE this time. – John Bollinger Apr 20 '16 at 19:45
  • I get this error msg `error: ‘yyg’ undeclared (first use in this function) #define yytext yyg->yytext_r` – Niklas Rosencrantz Apr 20 '16 at 19:56