3

I am writing a C program which will take a list of commands from stdin and exec them. I am having unexpected results from using strcmp after reading in from stdin.

Here is my program test_execvp.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h> 

int main(int argc, char const *argv[])
{
    char * line_buffer[100];
    size_t line_len;
    int cmd_count = 0;
    char * cmd_buffer[100][100];

   for( line_buffer[cmd_count] = NULL; getline(&line_buffer[cmd_count], &line_len, stdin) > 0; line_buffer[++cmd_count] = NULL)
   {
        line_buffer[cmd_count][strcspn(line_buffer[cmd_count], "\r\n")] = 0;
        int cmd = 0;
        while( (cmd_buffer[cmd_count][cmd] = strsep(&line_buffer[cmd_count], " ")) != NULL )
        {
            cmd++;
        }
    }


    printf("cmd_buffer[0][0]: \"%s\"\n", cmd_buffer[0][0]);
    printf("cmd_buffer[0][1]: \"%s\"\n", cmd_buffer[0][1]);
    printf("cmd_buffer[0][2]: \"%s\"\n", cmd_buffer[0][2]);
    printf("strcmp(cmd_buffer[0][1], \"-i\") == %d\n", strcmp(cmd_buffer[0][1], "-i") );
    printf("strcmp(cmd_buffer[0][1], \"-o\") == %d\n", strcmp(cmd_buffer[0][1], "-o") );

}

Now see this output:

Emil@EMIL-HP ~/Emil
$ gcc test_execvp.c -o test_execvp

Emil@EMIL-HP ~/Emil
$ cat cmdfile2
./addone –i add.txt
./addone
./addone –o add.txt

Emil@EMIL-HP ~/Emil
$ ./test_execvp < cmdfile2
cmd_buffer[0][0]: "./addone"
cmd_buffer[0][1]: "–i"
cmd_buffer[0][2]: "add.txt"
strcmp(cmd_buffer[0][1], "-i") == 181
strcmp(cmd_buffer[0][1], "-o") == 181

I don't understand how the line:

printf("strcmp(cmd_buffer[0][1], \"-i\") == %d\n", strcmp(cmd_buffer[0][1], "-i") );

can produce the output:

strcmp(cmd_buffer[0][1], "-i") == 181

if the line:

printf("cmd_buffer[0][1]: \"%s\"\n", cmd_buffer[0][1]);

produces the output:

cmd_buffer[0][1]: "–i"
  • The dashes in `cmdfile2` are apparently UTF-8 bytes 0xE2 0x80 0x93 = U+2013 EN DASH (`–`). At least, that's what I get from copy'n'paste; the `-` in the code is the regular U+002D HYPHEN-MINUS. This echoes what others said in answers. You'll need to edit `cmdfile2` and replace the en-dashes with normal dashes — how you do that depends on your editor of choice. – Jonathan Leffler May 17 '16 at 06:07

2 Answers2

2

If argv[1] were "-i", then strcmp would return 0. But it's not. Look closely and you will see that it is "–i", which is a different character. (It's longer and multibyte.)

rici
  • 234,347
  • 28
  • 237
  • 341
  • Your right I see the difference now, any suggestion on how to overcome this either in the code or textfile? – user3459138 May 17 '16 at 05:34
  • @user3459138: edit the textfile with a code editor instead of a word processor, or turn off automatic character substitutions. Then erase the – and insert a - – rici May 17 '16 at 05:44
2

Your text file contains some unicode homoglyph for - rather than an actual -. This is clear since 181+'-' is 0xe2, the lead byte for a 3-byte character.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711