2

I am wondering how to go about storing strings into an array of strings.

char buff[1024]; // to get chars from fgetc
char *line[2024]; // to store strings found in buff
int ch;
int i = 0;
while ((ch = fgetc(file)) != EOF) {
    buff[i] = ch;
    if (ch == '\n') { // find the end of the current line
       int line_index = 0;
       char *word;
       word = strtok(buff, ":"); // split the buffer into strings
       line[line_index] = word;
       line_index++;
       while (word != NULL) {
           word = strtok(NULL, ":");
           line[line_index] = word;
           line_index++;
       }
   }

Do I need to dynamically allocate every element of the line array before inserting word into that element?

Kev K
  • 17
  • 2
  • 1
    I would go for `strdup`. Yes, you need, word points into buff on each line readed. – KamilCuk Nov 01 '18 at 01:16
  • Change: `line[line_index] = word;` into `line[line_index] = strdup(word);` (in both places). Otherwise, the 2nd line will trash the word values located in the 1st line (and so on ...). – Craig Estey Nov 01 '18 at 01:18
  • You may also find `while (1) { char *cp = fgets(buff,sizeof(buff),file); if (cp == NULL) break; cp = strchr(buff,'\n'); if (cp != NULL) *cp = 0; // do strtok stuff ... }` easier than using `fgetc`. YMMV – Craig Estey Nov 01 '18 at 01:23
  • Thank you that helps. Now if I were to do that for a whole line, when I find a new line in the file how do I reuse my char *line[2024] array? – Kev K Nov 01 '18 at 02:21
  • 1
    You allocated for `line_index` words. To reuse those same pointers, make sure you `free` the memory -- or your next allocation and assignment with `strdup` will overwrite the address held in each filled `line[line_index]` resulting in a memory leak. So `for (int i = 0; i < line_index; i++) { free (line[i]); line[i] = NULL; }` before looping again for the next line. – David C. Rankin Nov 01 '18 at 03:16
  • OT: the posted code contains some 'magic' numbers. IE. 1024. 'magic' numbers make the code much more difficult to understand, debug, etc. Suggest using `#define` statements or and `enum` statement to give those 'magic' numbers meaningful names, then use those meaningful names throughout the code. – user3629249 Nov 01 '18 at 04:53
  • The posted code does not compile! it is missing some needed statements, like the `#include` statements for the needed header files. Are you expecting us to guess as to which header files your code actually uses? – user3629249 Nov 01 '18 at 04:55

2 Answers2

3

You are free to use either calloc (or malloc or realloc) or strdup if you have it. All strdup does is to automate the process of allocating length + 1 characters and then copying the given string to the new block of memory which you then assign to your pointer. It does the exact same thing you would do on your own if you didn't have strdup available.

However, note, strdup allocates so you must validate the allocation just as you would if you had called one of the allocation functions directly. Further note, on failure, it sets errno and returns NULL just as any of the allocation functions do.

Before looking at examples, you have several other sources of error you must address. You declare both buff and line with a fixed size. Therefore, as you are adding characters to buff or filling pointers in line, you must be keeping track of the index and checking against the maximum available to protect your array bounds. If you have a file with 1024-character lines or lines with 2025 words, you write past the end of each array invoking Undefined Behavior.

Equally important is your choice of variable names. line isn't a line, it is an array or pointers to tokens or words separated by the delimiters you provide to strtok. The only variable that contains a "line" is buff. If you were going to call anything line, you should change buff to line and change line to word (or token). Now it is possible that your file does contain lines from something else separated by ':' in the input file (not provided), but without more, we will change line to word and line_index to word_index in the example. While your use of word as the word-pointer was fine, let's shorten it to wp to avoid conflict with the renamed word array of pointers to each word. buff is fine, you know you are buffering characters in it.

Your final problem is you have not nul-terminated buff before passing buff to strtok. All string functions require a nul-terminated string as their argument. Failure to provide on invokes Undefined Behavior.

Preliminary

Do not use magic numbers in your code. Instead:

#define MAXC 1024   /* if you need a constant, #define one (or more) */

int main (int argc, char **argv) {

    char buff[MAXC] = "",           /* line buffer */
        *word[MAXC * 2] = {NULL};   /* array of pointers to char */
    int ch,
        i = 0;
    ...
    while ((ch = fgetc(fp)) != EOF) {   /* read each char until EOF */
        int word_index = 0;
        buff[i++] = ch;
        if (i == MAXC - 1 || ch == '\n') {      /* protect array bounds */
            char *wp;       /* word pointer */
            buff[i++] = 0;  /* nul-termiante buf */
            for (wp = strtok (buff, " :\n");    /* initial call to strtok */
                word_index < MAXC && wp;        /* check bounds & wp */
                wp = strtok (NULL, " :\n")) {   /* increment - next token */

(note: you should actually check if (i == MAXC - 1 || (i > 1 && ch == '\n')) to avoid attempting to tokenize empty-lines -- that is left to you. Also note a for loop provides a convenient means to cover both calls to strtok in a single expression)

Using strdup to Allocate/Copy

As mentioned above, all strdup does is to allocate storage for the word pointed to by wp above (including storage for the nul-terminating characters), copy the word to the newly allocated block of memory, and then return a pointer to the first address in that block allowing you to assign the starting address to your pointer. It is quite convenient, but since it allocates, you must validate, e.g.

                /* NOTE: strdup allocates - you must validate */
                if (!(word[word_index] = strdup (wp))) {
                    perror ("strdup(wp)");
                    exit (EXIT_FAILURE);
                }
                word_index++;   /* increment after allocation validated */

Using strlen + calloc + memcpy To Do The Same Thing

If strdup wasn't available, or you simply wanted to allocate and copy manually, then you would just do the exact same thing. (1) Get the length of the word (or token) pointed to by wp, (2) allocate length + 1 bytes; and (3) copy the string pointed to by wp to your newly allocated block of memory. (the assignment of the starting address of the new block to your pointer occurs at the point of allocation).

An efficiency nit about copying to your new block of memory. Since you have already scanned forward in the string pointed to by wp to find the length, there is no need to scan the string again by using strcpy. You have the length, so just use memcpy to avoid the second scan for end-of-string (it's trivial, but shows an understanding of what has taken place in your code). Using calloc you would do:

                /* using strlen + calloc + memcpy */
                size_t len = strlen (wp);     /* get wp length */
                /* allocate/validate */
                if (!(word[word_index] = calloc (1, len + 1))) {
                    perror ("calloc(1,len+1)");
                    exit (EXIT_FAILURE);
                }   /* you alread scanned for '\0', use memcpy */
                memcpy (word[word_index++], wp, len + 1);

Now if I were to do that for a whole line, when I find a new line in the file how do I reuse my char *line[2024] array?

Well, it's called word now, but as mentioned in the comment, you have kept track of the number of pointers filled with your line_index (my word_index) variables, so before you can allocate a new block of memory and assign a new address to your pointer (thereby overwriting the old address held by the pointer), you must free the block of memory at the address currently held by the pointer (or your will lose the ability to ever free that memory causing a memory leak). It is good (but optional) to set the pointer to NULL after freeing the memory.

(doing so ensures only valid pointers remain in your array of pointers allowing you to iterate the array, e.g. while (line[x] != NULL) { /* do something */ x++; } - useful if passing or returning a pointer to that array)

To reuse the free the memory, reset the pointers for reuse and reset your character index i = 0, you could do something like the following while outputting the words from the line, e.g.

            }
            for (int n = 0; n < word_index; n++) {  /* loop over each word */
                printf ("word[%2d]: %s\n", n, word[n]); /* output */
                free (word[n]);     /* free memory */
                word[n] = NULL;     /* set pointer NULL (optional) */
            }
            putchar ('\n');     /* tidy up with newline */
            i = 0;              /* reset i zero */
        }
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */
}

Putting it altogether in an example that lets you choose whether to use strdup or use calloc depending on whether you pass the command line definition -DUSESTRDUP as part of your compiler string, you could do something like the following (note: I use fp instead of file for the FILE* pointer):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC 1024   /* if you need a constant, #define one (or more) */

int main (int argc, char **argv) {

    char buff[MAXC] = "",           /* line buffer */
        *word[MAXC * 2] = {NULL};   /* array of pointers to char */
    int ch,
        i = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while ((ch = fgetc(fp)) != EOF) {   /* read each char until EOF */
        int word_index = 0;
        buff[i++] = ch;
        if (i == MAXC - 1 || ch == '\n') {      /* protect array bounds */
            char *wp;       /* word pointer */
            buff[i++] = 0;  /* nul-termiante buf */
            for (wp = strtok (buff, " :\n");    /* initial call to strtok */
                word_index < MAXC && wp;        /* check bounds & wp */
                wp = strtok (NULL, " :\n")) {   /* increment - next token */
#ifdef USESTRDUP
                /* NOTE: strdup allocates - you must validate */
                if (!(word[word_index] = strdup (wp))) {
                    perror ("strdup(wp)");
                    exit (EXIT_FAILURE);
                }
                word_index++;   /* increment after allocation validated */
#else
                /* using strlen + calloc + memcpy */
                size_t len = strlen (wp);     /* get wp length */
                /* allocate/validate */
                if (!(word[word_index] = calloc (1, len + 1))) {
                    perror ("calloc(1,len+1)");
                    exit (EXIT_FAILURE);
                }   /* you alread scanned for '\0', use memcpy */
                memcpy (word[word_index++], wp, len + 1);
#endif 
            }
            for (int n = 0; n < word_index; n++) {  /* loop over each word */
                printf ("word[%2d]: %s\n", n, word[n]); /* output */
                free (word[n]);     /* free memory */
                word[n] = NULL;     /* set pointer NULL (optional) */
            }
            putchar ('\n');     /* tidy up with newline */
            i = 0;              /* reset i zero */
        }
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */
}

Compile

By default the code will use calloc for allocation, and a simple gcc compile string would be:

gcc -Wall -Wextra -pedantic -std=c11 -O3 -o strtokstrdupcalloc strtokstrdupcalloc.c

For VS (cl.exe) you would use

cl /nologo /W3 /wd4996 /Ox /Festrtokstrdupcalloc /Tc strtokstrdupcalloc.c

(which will create strtokstrdupcalloc.exe in the current directory on windows)

To compile using strdup, just add -DUSESTRDUP to either command line.

Example Input File

$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.

Example Use/Output

$ ./bin/strtokstrdupcalloc dat/captnjack.txt
word[ 0]: This
word[ 1]: is
word[ 2]: a
word[ 3]: tale

word[ 0]: Of
word[ 1]: Captain
word[ 2]: Jack
word[ 3]: Sparrow

word[ 0]: A
word[ 1]: Pirate
word[ 2]: So
word[ 3]: Brave

word[ 0]: On
word[ 1]: the
word[ 2]: Seven
word[ 3]: Seas.

(output is the same regardless how you allocate)

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/strtokstrdupcalloc dat/captnjack.txt
==4946== Memcheck, a memory error detector
==4946== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==4946== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==4946== Command: ./bin/strtokstrdupcalloc dat/captnjack.txt
==4946==
word[ 0]: This
word[ 1]: is
word[ 2]: a
word[ 3]: tale

word[ 0]: Of
word[ 1]: Captain
word[ 2]: Jack
word[ 3]: Sparrow

word[ 0]: A
word[ 1]: Pirate
word[ 2]: So
word[ 3]: Brave

word[ 0]: On
word[ 1]: the
word[ 2]: Seven
word[ 3]: Seas.

==4946==
==4946== HEAP SUMMARY:
==4946==     in use at exit: 0 bytes in 0 blocks
==4946==   total heap usage: 17 allocs, 17 frees, 628 bytes allocated
==4946==
==4946== All heap blocks were freed -- no leaks are possible
==4946==
==4946== For counts of detected and suppressed errors, rerun with: -v
==4946== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
0

The fastest, most readable and most portable way to allocate new room for a string is to use malloc+memcpy:

size_t size = strlen(old_str) + 1;
...
char* new_str = malloc (size);
if(new_str == NULL)
{ /* error handling */ }

memcpy(new_str, old_str, size);

There exists no argument against using this method when you know the length in advance.

Notes about inferior methods:

  • strcpy is needlessly slow when you already know the length.
  • strncpy -"-. And also dangerous since it is easy to miss null termination.
  • calloc is needlessly slow as it will do a zero-out of all memory.
  • strdup is needlessly slow and also non-standard, so it is non-portable and might not compile.
Lundin
  • 195,001
  • 40
  • 254
  • 396