0

This is for a college introductory CS course I am taking. I have been stuck on this problem for a few days, and our department resources have been swamped with other students on this assignment. Searching has not been much help, but I'm not sure I'm wording it correctly.

I am trying to write a program which reads words from a text file, compares them against another text file containing a list of correctly spelled words, and prints the incorrect words from the first text file.

I have written one while(fgets) loop to read each line of the input text file, a while loop nested inside it to tokenize each line into individual words, and finally another while(fgets) loop nested inside that to compare each token against each line in the dictionary file. I have seen some questions in which the inside loop has to be "reset", but I am using the strtok function to do that.

Here is a link to a gist of the code with samples of the input files.

Here is the output from this input:

Misspelled words in words.txt :
hello, A do not match
hello, AA do not match
hello, AAA do not match
hello, thirtytwomo do not match
hello, this do not match
hello, thisaway do not match
hello, contains do not match
hello, few do not match
hello, words do not match
hello, hello match
this
contanes
a
few
words

they 
are
seperated
by
multaple

And this is the relevant loop in question:

 while (fgets(tempBuffer, sizen, instream) != NULL) {

    tempBuffer[strlen(tempBuffer) - 1] = '\0';
    //remove the endline character from the line

    char *tempToken = strtok(tempBuffer, " ");
    //tokenize the line on space:

    while (tempToken != NULL)
    //will be null at the end of each line 
    {
        char *tempCheck = malloc(sizeof(char) * (strlen(tempToken) + 1));
        //build dynamic array to hold string to check
        strcpy(tempCheck, tempToken);

        while (fgets(tempDictBuffer, sizen, dictInstream) != NULL) {
            //compares against each line in dictionary

            tempDictBuffer[strlen(tempDictBuffer) - 1] = '\0';
            //remove the endline character from the line

            char *tempDict = malloc(
                    sizeof(char) * (strlen(tempDictBuffer) + 1));
            //build dynamic array to hold string from dictionary
            strcpy(tempDict, tempDictBuffer);

            if (strcmp(tempCheck, tempDict) == 0) {
                printf("%s, %s match\n", tempCheck, tempDict);
                //if the string matches a dictionary line, this prints
                result = 1;
                //sets flag
            } else {
                printf("%s, %s do not match\n", tempCheck, tempDict);
                //if the string does not match, this prints
            }

            free(tempDict);
            tempDict = NULL;

        }
        if (result != 1) {
            printf("%s\n", tempCheck);
            //checks flag
        }
        result = 0;
        //resets flag
        free(tempCheck);
        tempCheck = NULL;
        tempToken = strtok(NULL, " ");
        //gets next token in line and reruns second while loop
    }

Thanks for any help you can provide!

Juggerbot
  • 103
  • 4
  • 2
    On an unrelated note, `fgets` may *not* add the newline to the buffer, if the buffer size is to small to fit the whole line. You should really check for it first. – Some programmer dude Apr 14 '15 at 01:02
  • 2
    Also, there's no need for the temporary memory you allocate, you can compare e.g. `tempToken` and `tempDictBuffer` directly. – Some programmer dude Apr 14 '15 at 01:03
  • 2
    you need `rewind` for dictionary file each test word. – BLUEPIXY Apr 14 '15 at 01:07
  • `malloc()/strcpy()` can be replaced with a single call to `strdup()`. – Andrew Henle Apr 14 '15 at 01:33
  • If `fgets` isn't adding a newline, are you saying the `sizen` variable I'm using isn't big enough? It's an int set to 100 at the moment to reduce complexity while I get this figured out. – Juggerbot Apr 14 '15 at 01:50
  • What he is saying is check before setting the end of your line to the `null-terminating char`, you might be chopping off a valid char. E.g. check with `size_t n = strlen(tempBuffer); while (n > 0 && tempBuffer[n - 1] == 0) tempBuffer[--n] = 0;` – David C. Rankin Apr 14 '15 at 02:09
  • You can eliminate some uncertainty by using `getline` instead of `fgets` and allowing `getline` to allocate space for your line as needed. (see **man getline**) If you set `lineptr=NULL`, it will force `getline` to allocate space sufficient to hold your line. (it will also return the number of characters actually read, eliminating the need to call `strlen`) – David C. Rankin Apr 14 '15 at 02:14
  • suggest: read all the dictionary before reading the test words. all those malloc/free operations are very 'expensive'. After reading all the way through the dictionary file, the file pointer will be at the end of the file. Any further read will do nothing but set errno and return a NULL. Suggest; set the file pointer back to the beginning of the dictionary file. perhaps by using fseek( dictinstream, 0, SEEK_SET ); Always check (!=NULL) the return code from each malloc to assure the operation was successful. – user3629249 Apr 14 '15 at 02:30
  • on certain OSs, the newline is a 2 character item. (windows, DOS). suggest finding/replacing the newline by char *offset = strstr(tempBuffer,"\n" ); if( NULL != offset ) {*offset = '\0';} – user3629249 Apr 14 '15 at 02:35
  • regarding this kind of line: ' while (fgets(tempDictBuffer, sizen, dictInstream) != NULL) {' sizen must match the actual size of tempDictBuffer which can be a problem. suggest: ' while (fgets(tempDictBuffer, sizeof(tempDictBuffer), dictInstream) != NULL) {' – user3629249 Apr 14 '15 at 02:38
  • From what I read, fgets will read characters until a null terminating character OR after the number of chars specified by the second argument, _whichever comes first_. Is that incorrect? – Juggerbot Apr 14 '15 at 17:04
  • After going back to fseek and actually getting it to work, it now restarts the loop as desired. Thanks a lot for the help, and hopefully I can also eliminate the redundant buffer strings. – Juggerbot Apr 15 '15 at 01:45

1 Answers1

0

This is somewhat coincidental, but I happened to have a few functions that essentially do what it is you are trying to do. It may not be a perfect fit, but the following with read 2 files, load the lines of each into an array of pointers to char, then split each line into tokens and compare each of the respective tokens to determine if spelling differs and output the words on each line that are not spelled the same.

It may provide you with a few additional ideas about how to approach your problem. Note, this is by way of example, and not represented to be fully tested for all corner-cases, etc. Since you were allocating storage dynamically, it helps to continue that approach in tokenizing each line. A function that will fully tokenize each line and return the words in an array of pointers to char, cuts down substantially on the number and type of nested loops required. For what it's worth, take a look. Also note that the prn_chararray function is not used in the code below, but is left as a convenience:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NMAX 256
#define BUFL 64
#define MAXS 32

char **readtxtfile (char *fn, size_t *idx);
char **splitstr (char *s, size_t *n);
char **realloc_char (char **p, size_t *n);
void prn_chararray (char **ca);
void free_chararray (char **ca);

int main (int argc, char **argv) {

    if (argc < 3 ) {
        fprintf (stderr, "error: insufficient input, usage: %s <filename1> <filename2>\n", argv[0]);
        return 1;
    }

    size_t file1_size = 0;  /* placeholders to be filled by readtxtfile */
    size_t file2_size = 0;  /* for general use, not needed to iterate   */
    size_t i = 0;           /* general counter/iterator                 */
    size_t linemin = 0;     /* minimum of comparison lines in file1/2   */

    /* read each file into an array of strings,
       number of lines read, returned in file_size */
    char **file1 = readtxtfile (argv[1], &file1_size);
    char **file2 = readtxtfile (argv[2], &file2_size);

    linemin = file1_size < file2_size ? file1_size : file2_size;

    for (i = 0; i < linemin; i++)
    {
        size_t nwords1 = 0;     /* number of words read file1 line          */
        size_t nwords2 = 0;     /* number of words read file2 line          */
        size_t wordmin = 0;     /* minimum number of words in file1/2 lines */
        size_t j = 0;           /* general counter/iterator                 */

        printf ("\n file1[%2zu] : %s\n file2[%2zu] : %s\n\n", i, file1[i], i, file2[i]);

        char **f1words = splitstr (file1[i], &nwords1);
        char **f2words = splitstr (file2[i], &nwords2);

        if (!f1words || !f2words) {
            fprintf (stderr, "error: word splitting falure.\n");
            continue;
        }

        wordmin = nwords1 < nwords2 ? nwords1 : nwords2;
        for (j = 0; j < wordmin; j++)
        {
            if (strcmp (f1words[j], f2words[j]))
                printf ("  %16s  !=  %s\n", f1words[j], f2words[j]);
        }

        free_chararray (f1words);
        free_chararray (f2words);

        f1words = NULL;
        f2words = NULL;
    }

    /* simple free memory function */
    if (file1) free_chararray (file1);
    if (file2) free_chararray (file2);

    return 0;
}

char** readtxtfile (char *fn, size_t *idx)
{
    if (!fn) return NULL;           /* validate filename provided       */

    char *ln = NULL;                /* NULL forces getline to allocate  */
    size_t n = 0;                   /* max chars to read (0 - no limit) */
    ssize_t nchr = 0;               /* number of chars actually read    */
    size_t nmax = NMAX;             /* check for reallocation           */
    char **array = NULL;            /* array to hold lines read         */
    FILE *fp = NULL;                /* file pointer to open file fn     */

    /* open / validate file */
    if (!(fp = fopen (fn, "r"))) {
        fprintf (stderr, "%s() error: file open failed '%s'.", __func__, fn);
        return NULL;
    }

    /* allocate NMAX pointers to char* */
    if (!(array = calloc (NMAX, sizeof *array))) {
        fprintf (stderr, "%s() error: memory allocation failed.", __func__);
        return NULL;
    }

    /* read each line from stdin - dynamicallly allocated   */
    while ((nchr = getline (&ln, &n, fp)) != -1)
    {
        /* strip newline or carriage rtn    */
        while (nchr > 0 && (ln[nchr-1] == '\n' || ln[nchr-1] == '\r'))
            ln[--nchr] = 0;

        array[*idx] = strdup (ln);  /* allocate/copy ln to array        */

        (*idx)++;                   /* increment value at index         */

        if (*idx == nmax)           /* if lines exceed nmax, reallocate */
            array = realloc_char (array, &nmax);
    }

    if (ln) free (ln);              /* free memory allocated by getline */
    if (fp) fclose (fp);            /* close open file descriptor       */

    return array;
}

/* split string 's' into separate words including break on
space as well as non-printing and format characters
return pointer to array of pointers to strings 'a' and
number of words in 'n' */
char **splitstr (char *s, size_t *n)
{
    if (!s || !*s ) return NULL;

    char *p = s;                            /* pointer to char          */
    char buf[BUFL] = {0};                   /* temporary buffer         */
    char *bp = buf;                         /* pointer to buf           */
    size_t maxs = MAXS;                     /* check for reallocation   */
    *n = 0;                                 /* index number of tokens   */

    /* allocate and validate array of pointer to char */
    char **a = calloc (MAXS, sizeof *a);
    if (!a) {
        fprintf (stderr, "%s() error: memory allocation failed.\n", __func__);
        return NULL;
    }

    while (*p)                              /* for each char in string1 */
    {
        /* skip each non-print/format char */
        while (*p && (*p <= ' ' || *p > '~'))
            p++;

        if (!*p) break;                     /* break if end reached     */

        while (*p > ' ' && *p <= '~')       /* for each printable char  */
        {
            *bp = *p++;                     /* copy to strings buffer   */
            bp++;                           /* advance to nex position  */
        }

        *bp = 0;                            /* null-terminate strings   */
        a[*n] = strdup (buf);               /* alloc/copy buf to a[*n]  */
        (*n)++;                             /* next index in strings    */

        if (*n == maxs)                     /* check if *n exceeds maxs */
            a = realloc_char (a, &maxs);    /* realloc if a if reqd     */

        bp = buf;                           /* reset bp to start of buf */
    }

    return a;
}

/* print an array of character pointers. */
void prn_chararray (char **ca)
{
    register size_t n = 0;
    while (ca[n])
    {
        printf (" arr[%3zu]  %s\n", n, ca[n]);
        n++;
    }
}

/* free array of char* */
void free_chararray (char **ca)
{
    if (!ca) return;
    register size_t n = 0;
    while (ca[n])
        free (ca[n++]);
    free (ca);
}

/*  realloc an array of pointers to strings setting memory to 0.
 *  reallocate an array of character arrays setting
 *  newly allocated memory to 0 to allow iteration
 */
char **realloc_char (char **p, size_t *n)
{
#ifdef DEBUG
    printf ("\n  reallocating %zu to %zu (size: %lu)\n", *n, *n * 2, 2 * *n * sizeof *p);
#endif
    char **tmp = realloc (p, 2 * *n * sizeof *p);
    if (!tmp) {
        fprintf (stderr, "%s() error: reallocation failure.\n", __func__);
        // return NULL;
        exit (EXIT_FAILURE);
    }
    p = tmp;
    memset (p + *n, 0, *n * sizeof *p); /* memset new ptrs 0 */
    *n *= 2;

    return p;
}

Input Files

$ cat dat/words1.txt
Eye have a spelling chequer,
It came with my Pea Sea.
It plane lee marks four my revue,
Miss Steaks I can knot sea.
Eye strike the quays and type a whirred,
And weight four it two say,
Weather eye am write oar wrong,
It tells me straight aweigh.
Eye ran this poem threw it,
Your shore real glad two no.
Its vary polished in its weigh.
My chequer tolled me sew.
A chequer is a bless thing,
It freeze yew lodes of thyme.
It helps me right all stiles of righting,
And aides me when eye rime.
Each frays come posed on my screen,
Eye trussed too bee a joule.
The chequer pours over every word,
Two cheque sum spelling rule.

$ cat dat/words2.txt
I have a spelling checker,
It came with my Pin See.
It plainly skips marks for my revue,
Mistakes skip I can not see.
I strike the keys and type a word,
And wait for it to say,
Whether I am right or wrong,
It tells me straight away.
I ran this poem through it,
Your are real glad too no.
Its very polished in its way.
My checker told me so.
A checker is a blessed thing,
It frees you lots of time.
It helps me write all styles of writing,
And helps me when I rhyme.
Each pharse composed up on my screen,
I trust too bee a jewel.
The checker pours over every word,
Two check some spelling rule.

Output

$ ./bin/getline_cmplines dat/words1.txt dat/words2.txt

 file1[ 0] : Eye have a spelling chequer,
 file2[ 0] : I have a spelling checker,

               Eye  !=  I
          chequer,  !=  checker,

 file1[ 1] : It came with my Pea Sea.
 file2[ 1] : It came with my Pin See.

               Pea  !=  Pin
              Sea.  !=  See.

 file1[ 2] : It plane lee marks four my revue,
 file2[ 2] : It plainly skips marks for my revue,

             plane  !=  plainly
               lee  !=  skips
              four  !=  for

 file1[ 3] : Miss Steaks I can knot sea.
 file2[ 3] : Mistakes skip I can not see.

              Miss  !=  Mistakes
            Steaks  !=  skip
              knot  !=  not
              sea.  !=  see.

 file1[ 4] : Eye strike the quays and type a whirred,
 file2[ 4] : I strike the keys and type a word,

               Eye  !=  I
             quays  !=  keys
          whirred,  !=  word,

 file1[ 5] : And weight four it two say,
 file2[ 5] : And wait for it to say,

            weight  !=  wait
              four  !=  for
               two  !=  to

 file1[ 6] : Weather eye am write oar wrong,
 file2[ 6] : Whether I am right or wrong,

           Weather  !=  Whether
               eye  !=  I
             write  !=  right
               oar  !=  or
<snip>

Leak Check

$ valgrind ./bin/getline_cmplines dat/words1.txt dat/words2.txt
==5670== Memcheck, a memory error detector
==5670== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==5670== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==5670== Command: ./bin/getline_cmplines dat/words1.txt dat/words2.txt
==5670==

 file1[ 0] : Eye have a spelling chequer,
 file2[ 0] : I have a spelling checker,

               Eye  !=  I
          chequer,  !=  checker,

<snip>

==5670==
==5670== HEAP SUMMARY:
==5670==     in use at exit: 0 bytes in 0 blocks
==5670==   total heap usage: 330 allocs, 330 frees, 18,138 bytes allocated
==5670==
==5670== All heap blocks were freed -- no leaks are possible
==5670==
==5670== For counts of detected and suppressed errors, rerun with: -v
==5670== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85