-2

I am reading in a line of text from a file, and trying to split it on spaces with strtok(), then storing each token in an array of char pointers using strcpy(), within a function called tokenize.

Code:

/**
** Tokenizes contents of buffer into words
** Delimiter is a space character
*/
void tokenize(char *buffer, char *tokens[]){
    char *token = NULL;
    int i = 0;
    token = strtok(buffer, " ");
    while(token != NULL){
        strcpy(tokens[i], token);//seg fault line
        token = strtok(NULL, " ");
        i++;
    }
}

I assumed, based on the function description in K&R, that my call would work because I passed a char pointer, which is what tokens[i] should dereference to, and a another char pointer containing the memory address of the string to be copied, which is what token should be. However, I get a seg fault when I try to strcpy.

This function is called in main immediately after a line is retrieved using a call to fgets(). buffer is declared as char buffer[MAXLINE_LEN - 1] which gives it a size of 100. tokens is declared as char *tokens[MAXLINE_LEN - 1].

Call to tokenize in main:

while(fgets(buffer, MAXLINE_LEN, inputFile)){
    int choice;
    printf("%s", buffer);
    tokenize(buffer, tokens);
    /**
    ....
    */
}

I am using: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

EDIT: Declaration of tokens (this occurs before the while-loop shown above, and in main):

char *tokens[MAXLINE_LEN - 1];//MAXLINE_LEN - 1 = 100

int a;
for(a = 0; a < MAXLINE_LEN - 1; a++)
    tokens[a] = NULL;
Ungeheuer
  • 1,393
  • 3
  • 17
  • 32

2 Answers2

0

There are four possibilities I can think of

  1. tokens is NULL or uninitialised
  2. tokens is initialised but you haven't allocated enough space. What if you allocate space for four tokens but strtok() parses five tokens?
  3. The individual elements of tokens are nNULL or uninitialised
  4. You didn't allocate enough space in each elemant to copy a token string.

One or more of those almost certainly applies to your code. One way to fix it would be to completely allocate tokens inside the function and return it in a pointer (with the count as the return value).

enum
{
    CHUNK_SIZE = 16; // A trick to get an int constant in C that works everywhere a #define works
};

size_t tokenize(char *buffer, char **tokenRet[])
{
    size_t capacity = 0;
    char** tokens = NULL;
    size_t count = 0;
    char *token = NULL;
    token = strtok(buffer, " ");
    while(token != NULL)
    {
        count++;
        if (count > capacity)
        {
            capacity += CHUNK_SIZE;
            tokens = realloc(tokens, capacity * sizeof *tokens);
        }
        tokens[count - 1] = strdup(token);//strdup mallocs space and copies the string into it
        token = strtok(NULL, " ");
    }
    *tokenRet = tokens;
    return count;
}

Call it like this

char** myTokens = NULL;
size_t tokenCount = tokenize(buffer, &myTokens);

Then to tidy up at the end, free() the first tokenCount elements of myTokens and then free myTokens.

JeremyP
  • 84,577
  • 15
  • 123
  • 161
  • is `**tokenRet[] a three dimensional array of char pointers? Why would I need this? I'm new to C, and pointers are doing a doozy on my brain, esp. at 5am – Ungeheuer Apr 07 '17 at 09:07
  • No `tokenRet` is a pointer to an array of pointers to `char`. In this case you can think of it as a pointer to an array of strings. The reason it needs the extra indirection is so that tokenize can reach back into its caller and change the array of strings declared there to be the array that it has built from buffer. – JeremyP Apr 07 '17 at 09:11
  • `char* myTokens[] = NULL;` --> `char** myTokens = NULL;` – BLUEPIXY Apr 07 '17 at 09:16
-1

What is the size of "char *tokens[]"? It is possible that you have more tokens in "char *buffer" than tokens[] can hold. Also, did you allocate memory for the array of pointers?

mathedi
  • 70
  • 1
  • 2
  • 6
  • 1
    As I mentioned in my question, `tokens` is declared to have a size of 100. It is not possible for `buffer` to hold more than 100 tokens, as it contains 100 characters. At most then, there could be 50 tokens in `buffer` if each token was a single character with a space between each one. – Ungeheuer Apr 07 '17 at 08:51
  • 1
    Did you allocate memory for each tokens[] pointer before passing down to tokenize()? – mathedi Apr 07 '17 at 08:54
  • No I didn't. I assumed they would be allocated memory when I declared the array, in face, they have 8 bytes which is the proper pointer size for my machine. If I did use a malloc call before passing them to tokenize, I would have no idea how much to allocate to each pointer. Also, what would I be allocating memory for/to? – Ungeheuer Apr 07 '17 at 09:03