2

I am creating simple shell interpreter. I am creating this shell using a scanner and a parser using lex and yacc utility. The problem arises when i give an argument to command as it returns invalid option.For example when I type in ls -l it returns ls: invalid option -- ' even though i have checked the value of the arg_list which holds the value of arguments and command and it stores the correct argument.

Can someone, please, help me understand why I am getting this error? For my code, the lex scans the input and returns a token to the parser whenever a string is matched.

(This code of program only has the capability to run a single argument).

This is my lex specification file. "shell.l"

%{
    #include<stdio.h>
    #include<stdlib.h>
    #include"y.tab.h"
    extern int len;
%}
%option noyywrap
letter [a-zA-Z]+

%%
(\-)?{letter}                       {yylval.id=yytext;len=yyleng;return WORD;}
[ \t\n]                         {;}
.                           {return NOTOKEN;}

%%

This is my yacc specification file and a my main() as well. "shell.y"

%{
    #include<stdio.h>
    #include<stdlib.h>
    #include<unistd.h>
    extern int yylex();
    void yyerror();
    int append =0;
    void insertArg();
    void insertComm();
    char *arg_list[20];
    int i,count=1;
    int len;
    char command[20];
%}
%union{
    float f;
    char  *id;
}
%start C
%token <id> WORD
%token  NOTOKEN
%type <id> C A

%%
A:A WORD    {$$=$2;insertArg($2,len);count++;}
 |
 ;
C:WORD {$$=$1;insertComm($1,len);/*printf("%s\n",$$);*/} A  
 ;

%%

void insertComm(char *c,int len)
{   

    for(i=0;i<len;i++)
    {   
        command[i]=c[i];

    }
    arg_list[0]=&command[0];
}
void insertArg(char *arg,int len)
{
    arg_list[count]=&arg[0];
}
void yyerror(char *msg){
    fprintf(stderr,"%s\n",msg);
    exit(1);
}

int main()
{   
    int status;
    while(1){
    yyparse();
    //arg_list[count]='\0';
    printf("\n arg_list[0]= %s",arg_list[0]);
    for(i=0;arg_list[i]!='\0';i++)
    {
        printf("\n arg_list[%d]= %s",i,arg_list[i]);
    }
    //printf("%s",sizeof(arg_list[1]));
    execvp(arg_list[0],arg_list);
    }
}
  • 1
    The code you posted does not compile for me: "shell.y:28.9-10: error: $$ for the midrule at $2 of 'C' has no declared type" – sepp2k Apr 29 '18 at 15:50
  • 1
    You did a good job on creating an MCVE ([MCVE]). Well done! It could be better; many people do a lot worse. And it is particularly hard when you have Lex and Yacc as part of the mix. – Jonathan Leffler Apr 29 '18 at 16:49

1 Answers1

4

Diagnosis

Your code ends up including a newline after the -l in the argument, and ls is complaining that it doesn't have a newline as a valid option letter.

Diagnostic technique

I modified the printing code in your main to read:

printf("arg_list[0]= %s\n", arg_list[0]);
for (i = 0; arg_list[i] != NULL; i++)
{
    printf("arg_list[%d] = [%s]\n", i, arg_list[i]);
}
fflush(stdout);

It produced:

$ ./shell
ls -l
arg_list[0]= ls
arg_list[0] = [ls]
arg_list[1] = [-l
]
ls: illegal option -- 

usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
$

Notice that there is a newline after the -l and before the ] that marks the end of the string.

There are various changes to note in the code. First, the print format strings end with a newline, not start — that helps ensure that the output is output in a timely manner. Without a newline at the end, the output may be delayed indefinitely, until something does produce a newline.

Second, my compiler warned me about:

shellg.y: In function ‘main’:
shellg.y:60:24: warning: comparison between pointer and zero character constant [-Wpointer-compare]
     for(i=0;arg_list[i]!='\0';i++)
                        ^~
shellg.y:60:13: note: did you mean to dereference the pointer?
     for(i=0;arg_list[i]!='\0';i++)
             ^

(Aside: I called the files shell.l and shellg.y — using the same basename twice leaves me at a loss because both object files would be shell.o under my normal build regime. I guess you use different rules for compiling your code.)

I changed the '\0' (which is a "null pointer constant", but is an aconventional one and more typically indicates that the writer is confused) into NULL.

The print format in the loop is important; notice how I enclosed the %s in marker characters [%s]. The placement of the ] in the output immediately shows that the newline is the problem. This is a valuable technique; it makes the invisible visible again.

The final fflush(stdout) is not really crucial in this context, but it does ensure that any pending standard output has been generated before the execvp() replaces the program and loses that output forever. It would be reasonable to use fflush(0) or fflush(NULL) to ensure that other file streams (standard error) are also fully flushed, but standard error is normally not buffered very much.

Prescription

Clearly, the fix is to upgrade the lexical code to not include the newline in the argument. However, it isn't immediately obvious why that's happening at all. I altered the grammar to do some printing:

%%
A:A WORD    {printf("R1: len %d [%s]\n", len, $2); $$=$2;insertArg($2,len);count++;}
 |
 ;
C:WORD {printf("R2A: len %d [%s]\n", len, $1); $$=$1;insertComm($1,len);/*printf("%s\n",$$);*/} A  
    {printf("R2B: len %d [%s]\n", len, $3);}
 ;

%%

Note in particular the R2B line; there could be an action there, but you didn't have one.

When this is run, I get:

$ ./shell
ls -l
R2A: len 2 [ls]
R1: len 2 [-l]
R2B: len 2 [-l
]
arg_list[0]= ls
arg_list[0] = [ls]
arg_list[1] = [-l
]
ls: illegal option -- 

usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
$

It's interesting that the -l shows up in the R1 output without the newline, but a newline is added by the time of the R2B output. That was unexpected, but fortuitous. I added the printing to make sure that the coverage was complete; I'm very glad I did!

So, what gives? Why isn't the token being reset to empty? Why is the newline added? Why isn't a copy of the token made, instead of storing a pointer to the yytext?

The long term solution is going to be to make a copy; so we can start out right. I'm going to assume strdup() is available to you and will use that. I added #include <string.h> to the includes and used:

void insertArg(char *arg,int len)
{
    arg_list[count] = strdup(arg);
}

Hey presto! The desired output:

$ ./shell
ls -l
R2A: len 2 [ls]
R1: len 2 [-l]
R2B: len 2 [-l
]
arg_list[0]= ls
arg_list[0] = [ls]
arg_list[1] = [-l]
total 128
-rw-r--r--  1 jleffler  staff   1443 Apr 29 08:36 data
-rwxr-xr-x  1 jleffler  staff  24516 Apr 29 09:08 shell
-rw-r--r--  1 jleffler  staff    297 Apr 29 08:36 shell.l
-rw-r--r--  1 jleffler  staff  13568 Apr 29 08:38 shell.o
-rw-r--r--  1 jleffler  staff   4680 Apr 29 09:08 shellg.o
-rw-r--r--  1 jleffler  staff   1306 Apr 29 09:08 shellg.y
-rw-r--r--  1 jleffler  staff   2245 Apr 29 09:08 y.tab.h
$

Further observations

You are going to need to make sure you release the duplicated strings. You also need to compile with warning options. Contrary to my normal practice, I compiled with just default (almost no) warnings. Compiling the grammar shows:

yacc -d shellg.y
shellg.y:28.3: warning: empty rule for typed nonterminal, and no action
shellg.y:30.3-31.44: warning: unused value: $2
gcc -O -c y.tab.c
mv y.tab.o shellg.o
rm -f y.tab.c

You should resolve both those. You don't declare prototypes for the functions you define — you have:

extern int yylex();
void yyerror();
int append =0;
void insertArg();
void insertComm();

There are 4 function declarations, but none of them is a function prototype. You need to add the expected arguments or void where no arguments are expected.

There are other problems too. With my normal compilation options (rmk is a variant of make; the -u means 'unconditional rebuild'), I get:

$ rmk -ku
    yacc  shellg.y
shellg.y:28.3: warning: empty rule for typed nonterminal, and no action
shellg.y:30.3-31.44: warning: unused value: $2
    gcc -O3   -g         -std=c11   -Wall -Wextra -Werror -Wmissing-prototypes -Wstrict-prototypes         -c y.tab.c
shellg.y:7:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
     extern int yylex();
     ^~~~~~
shellg.y:8:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
     void yyerror();
     ^~~~
shellg.y:10:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
     void insertArg();
     ^~~~
shellg.y:11:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
     void insertComm();
     ^~~~
shellg.y:36:6: error: no previous prototype for ‘insertComm’ [-Werror=missing-prototypes]
 void insertComm(char *c,int len)
      ^~~~~~~~~~
shellg.y:45:6: error: no previous prototype for ‘insertArg’ [-Werror=missing-prototypes]
 void insertArg(char *arg,int len)
      ^~~~~~~~~
shellg.y: In function ‘insertArg’:
shellg.y:45:30: error: unused parameter ‘len’ [-Werror=unused-parameter]
 void insertArg(char *arg,int len)
                              ^~~
shellg.y: At top level:
shellg.y:50:6: error: no previous prototype for ‘yyerror’ [-Werror=missing-prototypes]
 void yyerror(char *msg){
      ^~~~~~~
shellg.y: In function ‘main’:
shellg.y:57:9: error: unused variable ‘status’ [-Werror=unused-variable]
     int status;
         ^~~~~~
cc1: all warnings being treated as errors
rmk: error code 1
    lex  shell.l
    gcc -O3   -g         -std=c11   -Wall -Wextra -Werror -Wmissing-prototypes -Wstrict-prototypes         -c lex.yy.c
lex.yy.c:1119:16: error: ‘input’ defined but not used [-Werror=unused-function]
     static int input  (void)
                ^~~~~
lex.yy.c:1078:17: error: ‘yyunput’ defined but not used [-Werror=unused-function]
     static void yyunput (int c, register char * yy_bp )
                 ^~~~~~~
cc1: all warnings being treated as errors
rmk: error code 1
'shell' not remade because of errors.
'all' not remade because of errors.
$

I was too lazy to fix all those problems too.

I think you will need to fix your lexical analyzer too. The white space should not be included in the tokens returned, but it seems to be added to the end, but I'm not quite sure how/why.

$ ./shell
ls -l abelone ducks
R2A: len 2 [ls]
R1: len 2 [-l]
R1: len 7 [abelone]
R1: len 5 [ducks]
R2B: len 5 [ducks
]
arg_list[0]= ls
arg_list[0] = [ls]
arg_list[1] = [-l]
arg_list[2] = [abelone]
arg_list[3] = [ducks]
ls: abelone: No such file or directory
ls: ducks: No such file or directory
$

That R2B printing puzzles me. How/why is that being modified compared with the R1 before which shows ducks and no newline. You'll need to track that down, I think.

Adding diagnostics to the analyzer:

%%
(\-)?{letter}        {printf("L1: [%s]\n", yytext); yylval.id=yytext;len=yyleng;return WORD;}
[ \t\n]              {printf("L2: [%s]\n", yytext);}
.                    {printf("L3: [%s]\n", yytext); return NOTOKEN;}
%%

and running it yields:

$ ./shell
ls -l
L1: [ls]
R2A: len 2 [ls]
L2: [ ]
L1: [-l]
R1: len 2 [-l]
L2: [
]
R2B: len 2 [-l
]
arg_list[0]= ls
arg_list[0] = [ls]
arg_list[1] = [-l]
total 224
-rw-r--r--  1 jleffler  staff   1443 Apr 29 08:36 data
-rw-r--r--  1 jleffler  staff    303 Apr 29 09:16 makefile
-rwxr-xr-x  1 jleffler  staff  24516 Apr 29 09:29 shell
-rw-r--r--  1 jleffler  staff    385 Apr 29 09:29 shell.l
-rw-r--r--  1 jleffler  staff  13812 Apr 29 09:29 shell.o
-rw-r--r--  1 jleffler  staff   4680 Apr 29 09:08 shellg.o
-rw-r--r--  1 jleffler  staff   1306 Apr 29 09:08 shellg.y
-rw-r--r--  1 jleffler  staff  41299 Apr 29 09:16 y.tab.c
-rw-r--r--  1 jleffler  staff   2245 Apr 29 09:08 y.tab.h
$

Have fun tracking how/why the R2B printout includes the newline.

Detailed tracking — printing addresses

JFTR, I'm working on a Mac running macOS 10.13.4 High Sierra, with GCC 7.3.0 (home-built), with Bison 2.3 running as Yacc, and Flex 2.5.35 Apple(flex-31) running as Lex.

Here's a cleaned up shell.l — it compiles cleanly under my stringent warnings regime:

%{
    #include <stdio.h>
    #include <stdlib.h>
    #include "y.tab.h"
    extern int len;
%}
%option noyywrap
%option noinput
%option nounput
letter [a-zA-Z]+

%%
(\-)?{letter}       {printf("L1: [%s]\n", yytext); yylval.id=yytext;len=yyleng;return WORD;}
[ \t\n]             {printf("L2: [%s]\n", yytext);}
.                   {printf("L3: [%s]\n", yytext); return NOTOKEN;}
%%

And here's a more heavily diagnostic-laden version of shellg.y (that also compiles cleanly under my stringent warnings regime). I've reinstated the original code in insertArg() that copies the address into the arg_list array, but I've also added code to print out the complete contents of arg_list on each call. That turns out to be informative!

%{
    #include<stdio.h>
    #include<stdlib.h>
    #include <string.h>
    #include<unistd.h>
    extern int yylex(void);
    void yyerror(char *msg);
    int append =0;
    void insertArg(char *arg, int len);
    void insertComm(char *arg, int len);
    char *arg_list[20];
    int i,count=1;
    int len;
    char command[20];
%}
%union{
    float f;
    char  *id;
}
%start C
%token <id> WORD
%token  NOTOKEN
%type <id> C A

%%
A:  A   WORD    {printf("R1A: len %d %p [%s]\n", len, $2, $2); $$=$2;insertArg($2,len);count++;}
 |  /*Nothing */
                {printf("R1B: - nothing\n");}
 ;

C:  WORD
    {printf("R2A: len %d %p [%s]\n", len, $1, $1); $$=$1;insertComm($1,len);}
    A  
    {printf("R2B: len %d %p [%s]\n", len, $3, $3);}
 ;

%%

void insertComm(char *c, int len)
{
    printf("Command: %d [%s]\n", len, c);
    for (i = 0; i < len; i++)
    {
        command[i] = c[i];
    }
    arg_list[0] = &command[0];
}

void insertArg(char *arg, int len)
{
    printf("Argument: %d [%s]\n", len, arg);
    //arg_list[count] = strdup(arg);
    arg_list[count] = arg;
    for (int i = 0; i < count; i++)
        printf("list[%d] = %p [%s]\n", i, arg_list[i], arg_list[i]);
}

void yyerror(char *msg)
{
    fprintf(stderr, "%s\n", msg);
    exit(1);
}

int main(void)
{
    while (1)
    {
        yyparse();
        printf("arg_list[0]= %s\n", arg_list[0]);
        for (i = 0; arg_list[i] != NULL; i++)
        {
            printf("arg_list[%d] = [%s]\n", i, arg_list[i]);
        }
        fflush(stdout);
        execvp(arg_list[0], arg_list);
    }
}

When compiled and run, I can get the output:

$ ./shell
ls -l -rt makefile data shell
L1: [ls]
R2A: len 2 0x7f9dd4801000 [ls]
Command: 2 [ls]
R1B: - nothing
L2: [ ]
L1: [-l]
R1A: len 2 0x7f9dd4801003 [-l]
Argument: 2 [-l]
list[0] = 0x10ace8180 [ls]
L2: [ ]
L1: [-rt]
R1A: len 3 0x7f9dd4801006 [-rt]
Argument: 3 [-rt]
list[0] = 0x10ace8180 [ls]
list[1] = 0x7f9dd4801003 [-l -rt]
L2: [ ]
L1: [makefile]
R1A: len 8 0x7f9dd480100a [makefile]
Argument: 8 [makefile]
list[0] = 0x10ace8180 [ls]
list[1] = 0x7f9dd4801003 [-l -rt makefile]
list[2] = 0x7f9dd4801006 [-rt makefile]
L2: [ ]
L1: [data]
R1A: len 4 0x7f9dd4801013 [data]
Argument: 4 [data]
list[0] = 0x10ace8180 [ls]
list[1] = 0x7f9dd4801003 [-l -rt makefile data]
list[2] = 0x7f9dd4801006 [-rt makefile data]
list[3] = 0x7f9dd480100a [makefile data]
L2: [ ]
L1: [shell]
R1A: len 5 0x7f9dd4801018 [shell]
Argument: 5 [shell]
list[0] = 0x10ace8180 [ls]
list[1] = 0x7f9dd4801003 [-l -rt makefile data shell]
list[2] = 0x7f9dd4801006 [-rt makefile data shell]
list[3] = 0x7f9dd480100a [makefile data shell]
list[4] = 0x7f9dd4801013 [data shell]
L2: [
]
R2B: len 5 0x7f9dd4801018 [shell
]
arg_list[0]= ls
arg_list[0] = [ls]
arg_list[1] = [-l -rt makefile data shell
]
arg_list[2] = [-rt makefile data shell
]
arg_list[3] = [makefile data shell
]
arg_list[4] = [data shell
]
arg_list[5] = [shell
]
ls: illegal option --  
usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
$

Notice how the pointers in arg_list are pointing to different positions in a single string. The lexical analyzer inserts a null after the token when it returns, but replaces that null with the blank or newline (or tab if I'd typed any). This shows why it is necessary to copy the tokens. What's "in" arg_list[1] changes as the lexical analysis proceeds. It's why the newline appears.

Note the comments

Please note the comments by rici and follow the link too:

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I get that my code was really bad. Though how does the strdup() work up two problems at time , 1st problem which you gave the solution for and second one - if you were to replace the insertArg() code with previous code you will see this output ` ./nohop ls -l -F arg_list[0]= ls arg_list[0]= [ls] arg_list[1]= [-l -F ] arg_list[2]= [-F ] ls: invalid option -- ' ' Try 'ls --help' for more information. – Vedant Tripathi Apr 29 '18 at 16:56
  • We all had to start somewhere. There's worse code on SO by miles.And the compilation options shown are very fussy. Eventually (maybe even soon), you'll get the hang of making sure you have prototypes for functions and so on. If you've not seen the discipline before, it is daunting, but it quickly becomes second nature and averts many problems. Use fussy compilation options, and make your compilations fail when there are warnings unless you have a (very) good reason not to do so. I usually don't run code that doesn't compile without warnings under the options shown — but I like code that works. – Jonathan Leffler Apr 29 '18 at 16:58
  • As the output with the debug in the lexical code shows (I think), the lexical analyzer is producing correct tokens as it returns. By copying those correct tokens into the grammar semantic support structures (rather than leaving pointers to stuff in the lexical analyzer), you avoid the lexical analyzer changing what the grammar is working with. Look at the white space rule in the grammar; I think that is part of the problem. The `arg_list[1]` set to `"-l -F "` and `arg_list[2]` set to `"-F "` is a symptom here; things are being concatenated, somehow, and I'm not 100% sure how. _[…continued…]_ – Jonathan Leffler Apr 29 '18 at 17:06
  • _[…continuation…]_ Working that out might involve more intense diagnostic printing, looking at addresses, etc. That's part of why I've showed how I went about finding out what was happening. Your current lexical analyzer is clearly minimal, which is fine for the question. It will become more complex as it needs to handle I/O redirection and strings and variables and commands with digits in the names, and options with digits, and so on. As I said, I don't have a good explanation for what's going on, but part of the trouble is that `yytext` contains the current token, and is overwritten. – Jonathan Leffler Apr 29 '18 at 17:10
  • You could perhaps find out more by printing the command and arguments frequently. Also, note that you already copy the text from `yytext` into `command`, but you only copied a pointer, not the text, into the arguments. That isn't `strdup()`, but the net result is the same; the command name is copied before it gets hacked up by further lexical analysis. – Jonathan Leffler Apr 29 '18 at 17:13
  • But why the same thing isn't happening in command, when lex returns command , i am using the same action there as well , i mean.? – Vedant Tripathi Apr 29 '18 at 17:18
  • See the new 'detailed tracking' section at the end of the answer. – Jonathan Leffler Apr 29 '18 at 17:46
  • 1
    The original `lex` scanner declared `yytext` as something like `char yytext[YYLMAX];` which makes it obvious that the contents must be copied (since they will be overwritten on the next call). Flex, however, declares `yytext` to be `char*`, and it doesn't have the same value on every action. Instead, it points into the scanner's internal buffer. It still needs to be copied: the buffer will still be overwritten when more input is required, and also the NUL terminator is temporary. See http://westes.github.io/flex/manual/A-Note-About-yytext-And-Memory.html – rici Apr 29 '18 at 21:54
  • @rici. Thanks. The 'detailed tracking' section shows exactly the behaviour described in your reference. I was wondering what should happen with more complex grammars than the one in this question, ones with multiple tokens needing to be managed in a sequence, for example. – Jonathan Leffler Apr 29 '18 at 22:00
  • 1
    Since yacc/bison parsers use one lookahead token, it is essential that the token be copied in the lexer action if it will be required by a parser action. That's always been the case; nonetheless, "my token string has been corrupted" is the number one yacc/bison FAQ section of all time. (See the bison faqs) – rici Apr 30 '18 at 01:50