0

I've got a very large text file that I'm trying to do word analysis on. Among word count, I might be looking for other information as well, but I left that out for simplicity. In this text file I have blocks of text separated by asterisks '*'. The code I have below scans the text file and prints out # of characters and words as it should, but I'd like to reset the counter after an asterisk is met, and store all information in a table of some sort. I'm not so worried on how I'll make the table as much as I am unsure of how to loop the same counting code for each text block between asterisks.

Maybe a for loop like

for (arr = strstr(arr, "*"); arr; arr = strstr(arr + strlen("*"), "*"))  

Example text file:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I have a sentence. I have two sentences now.
*
I have another sentence. And another.
*
I'd like to count the amount of words and characters from the asterisk above this 
one until the next asterkisk, not including the count from the last one.
*
...
    ...
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    (EOF)

Desired output:

    *#      #words     #alphaChar
    ----------------------------
    1        9           34  
    -----------------------------
    2        5           30
    -----------------------------
    3       28           124
    ...
    ...


I have tried

        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>

        int main()
          {
          int characterCount=0;
          int counterPosition, wordCount=0, alphaCount=0;

          //input file
          FILE *file= fopen("test.txt", "r");
          if (file== NULL)
            printf("Cannot find the file.\n");


          //Count total number of characters in file
          while (1)
              {
              counterPosition = fgetc(speechFile);
              if (counterPosition == EOF)
                break;
              ++characterCount;
              }

          rewind(file); // Sends the pointer to the beginning of the file

          //Dynamically allocate since array size cant be variable
          char *arr= ( char*) malloc(totalCharacterCount);

          while(fscanf(speechFile, "%c", &arr[i]) != EOF ) //Scan until the end of file.
            i++;   //increment, storing each character in a unique position



              for(i = 0; i <characterCount; i++)
                  {
                  if(arr[i] == ' ') //count words
                    wordCount++;

                  if(isalpha(arr[i]))  //count letters only
                    alphaCount++;

                  }//end for loop

              printf("word count is %d and alpha count is %d", wordCount,alphaCount);
          }
  • wordcount and alphacount are uninitialised. Also you increment wordcount on every space, so " " (2 spaces) would count as two words. And `char *arr= ( char*) malloc(totalCharacterCount*sizeof(int));` could be `char *arr= malloc(totalCharacterCount);` – wildplasser Mar 26 '14 at 19:15
  • Added the fixes. This wasnt the problem though. I re wrote most of the code instead of copy and pasting, so initializing slipped my mind – user3465668 Mar 26 '14 at 19:22
  • Why does the program make three passes (two times for the file, one for the array) when only one pass is needed. Also: the program seems to be barely related to your goals. – wildplasser Mar 26 '14 at 19:33
  • What do you mean by three passes? I'm new to this. Also, the program is quite related to my goals. I will "draw" the chart myself. I was simply asking how to reset a counter in between each set of asterisks. – user3465668 Mar 26 '14 at 19:34

1 Answers1

0

Since you are having full files text in array arr[], you need to divide that string arr using * as delimiter. you can use strtok() to divide that string using * as delimiter. Then perform the word count and character count operation on each token. read this link to know about strtok.

LearningC
  • 3,182
  • 1
  • 12
  • 19
  • I remember someone mentioning strtok in another post, but for some reason I thought it was irrelevent. What you said makes sense, I'll give it a try. thanks! – user3465668 Mar 26 '14 at 19:37
  • @user3465668 ok. ya there are questions discussing the use of strtok() in stackoverflow so i dint go to add code. read one of those. check [this](http://stackoverflow.com/questions/8106765/using-strtok-in-c) to know about using strtok. it may help you coding – LearningC Mar 26 '14 at 19:42