You are free to use either calloc
(or malloc
or realloc
) or strdup
if you have it. All strdup
does is to automate the process of allocating length + 1 characters and then copying the given string to the new block of memory which you then assign to your pointer. It does the exact same thing you would do on your own if you didn't have strdup
available.
However, note, strdup
allocates so you must validate the allocation just as you would if you had called one of the allocation functions directly. Further note, on failure, it sets errno
and returns NULL
just as any of the allocation functions do.
Before looking at examples, you have several other sources of error you must address. You declare both buff
and line
with a fixed size. Therefore, as you are adding characters to buff
or filling pointers in line
, you must be keeping track of the index and checking against the maximum available to protect your array bounds. If you have a file with 1024-character
lines or lines with 2025
words, you write past the end of each array invoking Undefined Behavior.
Equally important is your choice of variable names. line
isn't a line, it is an array or pointers to tokens or words
separated by the delimiters you provide to strtok
. The only variable that contains a "line" is buff
. If you were going to call anything line, you should change buff
to line
and change line
to word
(or token
). Now it is possible that your file does contain lines from something else separated by ':'
in the input file (not provided), but without more, we will change line
to word
and line_index
to word_index
in the example. While your use of word
as the word-pointer was fine, let's shorten it to wp
to avoid conflict with the renamed word
array of pointers to each word. buff
is fine, you know you are buffering characters in it.
Your final problem is you have not nul-terminated buff
before passing buff
to strtok
. All string functions require a nul-terminated string as their argument. Failure to provide on invokes Undefined Behavior.
Preliminary
Do not use magic numbers in your code. Instead:
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buff[MAXC] = "", /* line buffer */
*word[MAXC * 2] = {NULL}; /* array of pointers to char */
int ch,
i = 0;
...
while ((ch = fgetc(fp)) != EOF) { /* read each char until EOF */
int word_index = 0;
buff[i++] = ch;
if (i == MAXC - 1 || ch == '\n') { /* protect array bounds */
char *wp; /* word pointer */
buff[i++] = 0; /* nul-termiante buf */
for (wp = strtok (buff, " :\n"); /* initial call to strtok */
word_index < MAXC && wp; /* check bounds & wp */
wp = strtok (NULL, " :\n")) { /* increment - next token */
(note: you should actually check if (i == MAXC - 1 || (i > 1 && ch == '\n'))
to avoid attempting to tokenize empty-lines -- that is left to you. Also note a for
loop provides a convenient means to cover both calls to strtok
in a single expression)
Using strdup
to Allocate/Copy
As mentioned above, all strdup
does is to allocate storage for the word pointed to by wp
above (including storage for the nul-terminating characters), copy the word to the newly allocated block of memory, and then return a pointer to the first address in that block allowing you to assign the starting address to your pointer. It is quite convenient, but since it allocates, you must validate, e.g.
/* NOTE: strdup allocates - you must validate */
if (!(word[word_index] = strdup (wp))) {
perror ("strdup(wp)");
exit (EXIT_FAILURE);
}
word_index++; /* increment after allocation validated */
Using strlen + calloc + memcpy
To Do The Same Thing
If strdup
wasn't available, or you simply wanted to allocate and copy manually, then you would just do the exact same thing. (1) Get the length of the word (or token) pointed to by wp
, (2) allocate length + 1
bytes; and (3) copy the string pointed to by wp
to your newly allocated block of memory. (the assignment of the starting address of the new block to your pointer occurs at the point of allocation).
An efficiency nit about copying to your new block of memory. Since you have already scanned forward in the string pointed to by wp
to find the length, there is no need to scan the string again by using strcpy
. You have the length, so just use memcpy
to avoid the second scan for end-of-string (it's trivial, but shows an understanding of what has taken place in your code). Using calloc
you would do:
/* using strlen + calloc + memcpy */
size_t len = strlen (wp); /* get wp length */
/* allocate/validate */
if (!(word[word_index] = calloc (1, len + 1))) {
perror ("calloc(1,len+1)");
exit (EXIT_FAILURE);
} /* you alread scanned for '\0', use memcpy */
memcpy (word[word_index++], wp, len + 1);
Now if I were to do that for a whole line, when I find a new line in
the file how do I reuse my char *line[2024] array?
Well, it's called word
now, but as mentioned in the comment, you have kept track of the number of pointers filled with your line_index
(my word_index
) variables, so before you can allocate a new block of memory and assign a new address to your pointer (thereby overwriting the old address held by the pointer), you must free
the block of memory at the address currently held by the pointer (or your will lose the ability to ever free that memory causing a memory leak). It is good (but optional) to set the pointer to NULL
after freeing the memory.
(doing so ensures only valid pointers remain in your array of pointers allowing you to iterate the array, e.g. while (line[x] != NULL) { /* do something */ x++; }
- useful if passing or returning a pointer to that array)
To reuse the free the memory, reset the pointers for reuse and reset your character index i = 0
, you could do something like the following while outputting the words from the line, e.g.
}
for (int n = 0; n < word_index; n++) { /* loop over each word */
printf ("word[%2d]: %s\n", n, word[n]); /* output */
free (word[n]); /* free memory */
word[n] = NULL; /* set pointer NULL (optional) */
}
putchar ('\n'); /* tidy up with newline */
i = 0; /* reset i zero */
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
}
Putting it altogether in an example that lets you choose whether to use strdup
or use calloc
depending on whether you pass the command line definition -DUSESTRDUP
as part of your compiler string, you could do something like the following (note: I use fp
instead of file
for the FILE*
pointer):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buff[MAXC] = "", /* line buffer */
*word[MAXC * 2] = {NULL}; /* array of pointers to char */
int ch,
i = 0;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while ((ch = fgetc(fp)) != EOF) { /* read each char until EOF */
int word_index = 0;
buff[i++] = ch;
if (i == MAXC - 1 || ch == '\n') { /* protect array bounds */
char *wp; /* word pointer */
buff[i++] = 0; /* nul-termiante buf */
for (wp = strtok (buff, " :\n"); /* initial call to strtok */
word_index < MAXC && wp; /* check bounds & wp */
wp = strtok (NULL, " :\n")) { /* increment - next token */
#ifdef USESTRDUP
/* NOTE: strdup allocates - you must validate */
if (!(word[word_index] = strdup (wp))) {
perror ("strdup(wp)");
exit (EXIT_FAILURE);
}
word_index++; /* increment after allocation validated */
#else
/* using strlen + calloc + memcpy */
size_t len = strlen (wp); /* get wp length */
/* allocate/validate */
if (!(word[word_index] = calloc (1, len + 1))) {
perror ("calloc(1,len+1)");
exit (EXIT_FAILURE);
} /* you alread scanned for '\0', use memcpy */
memcpy (word[word_index++], wp, len + 1);
#endif
}
for (int n = 0; n < word_index; n++) { /* loop over each word */
printf ("word[%2d]: %s\n", n, word[n]); /* output */
free (word[n]); /* free memory */
word[n] = NULL; /* set pointer NULL (optional) */
}
putchar ('\n'); /* tidy up with newline */
i = 0; /* reset i zero */
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
}
Compile
By default the code will use calloc
for allocation, and a simple gcc compile string would be:
gcc -Wall -Wextra -pedantic -std=c11 -O3 -o strtokstrdupcalloc strtokstrdupcalloc.c
For VS (cl.exe
) you would use
cl /nologo /W3 /wd4996 /Ox /Festrtokstrdupcalloc /Tc strtokstrdupcalloc.c
(which will create strtokstrdupcalloc.exe
in the current directory on windows)
To compile using strdup
, just add -DUSESTRDUP
to either command line.
Example Input File
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/strtokstrdupcalloc dat/captnjack.txt
word[ 0]: This
word[ 1]: is
word[ 2]: a
word[ 3]: tale
word[ 0]: Of
word[ 1]: Captain
word[ 2]: Jack
word[ 3]: Sparrow
word[ 0]: A
word[ 1]: Pirate
word[ 2]: So
word[ 3]: Brave
word[ 0]: On
word[ 1]: the
word[ 2]: Seven
word[ 3]: Seas.
(output is the same regardless how you allocate)
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/strtokstrdupcalloc dat/captnjack.txt
==4946== Memcheck, a memory error detector
==4946== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==4946== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==4946== Command: ./bin/strtokstrdupcalloc dat/captnjack.txt
==4946==
word[ 0]: This
word[ 1]: is
word[ 2]: a
word[ 3]: tale
word[ 0]: Of
word[ 1]: Captain
word[ 2]: Jack
word[ 3]: Sparrow
word[ 0]: A
word[ 1]: Pirate
word[ 2]: So
word[ 3]: Brave
word[ 0]: On
word[ 1]: the
word[ 2]: Seven
word[ 3]: Seas.
==4946==
==4946== HEAP SUMMARY:
==4946== in use at exit: 0 bytes in 0 blocks
==4946== total heap usage: 17 allocs, 17 frees, 628 bytes allocated
==4946==
==4946== All heap blocks were freed -- no leaks are possible
==4946==
==4946== For counts of detected and suppressed errors, rerun with: -v
==4946== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.