Tokenizing a String to Pass as char * into execve()

Question

My knowledge of C is very limited. I'm trying to tokenize a String passed to a server from a client, because I want to use passed arguments toexecve. The arguments passed viabufferneeds to be copied to*argv and tokenized such thatbuffer's tokens can be accessed withargv[0], argv[1], etc. Obviously I'm doing something incorrectly.

n = read(sockfd, buffer, sizeof(buffer));
strcpy(*argv, buffer);
printf("buffer:%s\n", buffer);
printf("argv:%s\n", *argv);
printf("argv[0]:%s\n", argv[0]);
printf("argv[1]:%s\n", argv[1]);
*argv = strtok_r(*argv, " ", argv);
printf("argv:%s\n", *argv);

i = fork();
if (i < 0) {
    //Close socket on fork error.
    perror("fork");
    exit(-1);
} else if (i == 0) {
    //execve on input args
    execve(argv[0], &argv[0], 0);
    exit(0);
} else {
    wait(&status);
    //close(sockfd);
}

Passing the arguments "/bin/date -u" with the above code gives an output of:

buffer:/bin/date -u

argv:/bin/date -u

argv[0]:/bin/date -u

argv[1]:(null)

What I what is an output of:

buffer:/bin/date -u

argv:/bin/date -u

argv[0]:/bin/date

argv[1]:-u

I tried usingstrtok_r(), but it didn't work as I intended. The snippet I inserted was:

*argv = strtok_r(*argv, " ", argv);
printf("argv:%s\n", *argv);

which give an output of argv:/bin/date.

Thanks in advanced, SO.

Edit: I don't have to explicitly tokenizebufferlike I have above. Any way to get arguments from the client passed to the server works fine.

David C. Rankin · Accepted Answer · 2014-07-17T07:51:51.023

Well, there are several issues you are dealing with. The first being the choice of argv as the varable you are writing buffer to. While it is just an array of pointers, you generally consider argv as the array holding the arguments passed to the instant process, not as a variable to modify. However, that is really semantics, there is no prohibition from doing it that I know of. However, you cannot tokenize *argv while at the same time assigning the tokens to *argv because strtok_r modifies *argv during the process.

Beyond that, the real issue appears to be the use of strtok_r. Take a look at man strtok_r. In order to tokenize a string, you need to make repeated calls to strtok_r in order to extract all tokens. The first call to strtok_r using the first argument (*argv...) merely extracts the first token. In order to complete the extraction, you must pass NULL as the first argument until all tokens have been extacted. Additionally, the string you are extracting tokens from is modified by calls to strtok_r and should not be used following extraction. Generally a copy of the string is made to preserve the original if it will be needed later.

In your code you call strtok_r only once E.g:

*argv = strtok_r(*argv, " ", argv);  // extracts the first token and modifies *argv

If your intent is to extract all strings, then you will need to make repeated calls to strtok_r something like:

char *token = malloc (sizeof (token) * 128); // or something large enough to hold the tokens

token = strtok_r(*argv, " ", argv);
if (token)
    printf (" token: %s\n", token);

while ((token = strtok_r (NULL, " ", argv)) != NULL)
{
    printf (" token: %s\n", token);
}

You can capture the individual tokens in however you like in order to pass them to execve. However, you are not going to be able to strip tokes out of argv while at the same time writing back to argv. As indicated above, argv is modified by strtok_r during extraction, so you will need a separate array to hold the tokens. Hope this helps.

Dmitry · Answer 2 · 2014-07-17T09:12:43.807

The strtok() and strtok_r() functions return one token at a time. They maintain state between calls and you need to call them in a loop to split a string into tokens. Also they modify the buffer passed as the first argument in-place, so you need to copy it.

Let me show you an example:

#include <stdio.h>
#include <string.h>

#define MAX_CMD_SIZE 1024
#define MAX_ARG_COUNT 10

main()
{
    const char *command = "/bin/test arg1 arg2 arg3 arg4 arg5";

    /* Allocate a buffer for tokenization.
     * the strtok_r() function modifies this buffer in-place and return pointers
     * to strings located inside this buffer. */
    char cmd_buf[MAX_CMD_SIZE] = { 0 };
    strncpy(cmd_buf, command, sizeof(cmd_buf));

    /* This strtok_r() call puts '\0' after the first token in the buffer,
     * It saves the state to the strtok_state and subsequent calls resume from that point. */
    char *strtok_state = NULL;
    char *filename = strtok_r(cmd_buf, " ", &strtok_state);
    printf("filename = %s\n", filename);

    /* Allocate an array of pointers.
     * We will make them point to certain locations inside the cmd_buf. */
    char *args[MAX_ARG_COUNT] = { NULL };

    /* loop the strtok_r() call while there are tokens and free space in the array */
    size_t current_arg_idx;
    for (current_arg_idx = 0; current_arg_idx < MAX_ARG_COUNT; ++current_arg_idx) {
        /* Note that the first argument to strtok_r() is NULL.
         * That means resume from a point saved in the strtok_state. */
        char *current_arg = strtok_r(NULL, " ", &strtok_state);
        if (current_arg == NULL) {
            break;
        }

        args[current_arg_idx] = current_arg;
        printf("args[%d] = %s\n", current_arg_idx, args[current_arg_idx]);
    }
}

The output of the example above is:

filename = /bin/test
args[0] = arg1
args[1] = arg2
args[2] = arg3
args[3] = arg4
args[4] = arg5

Note that I put filename and args into separate variables to illustrate a difference between the first call and the subsequent calls. For execve() you normally want to put them into a single array and call it like execve(argv[0], argv, NULL); because the filename is supposed to be the first element in argv.

Tokenizing a String to Pass as char * into execve()

2 Answers2