0

Problem solved here: fgetc not starting at beginning of large txt file

I am working in c and fgetc isn't getting chars from the beginning of the file. It seems to be starting somewhere randomly within the file after a \n. The goal of this function is to modify the array productsPrinted. If "More Data Needed" or "Hidden non listed" is encountered, the position in the array, productsPrinted[newLineCount], will be changed to 0. Any help is appreciated.

Update: It works on smaller files, but doesn't start at the beginning of the larger,617kb, file.

function calls up to category:

findNoPics(image, productsPrinted);
findVisible(visible, productsPrinted);
removeCategories(category, productsPrinted);

example input from fgetc():

Category\n
Diagnostic & Testing /Scan Tools\n
Diagnostic & Testing /Scan Tools\n
Hidden non listed\n
Diagnostic & Testing /Scan Tools\n
Diagnostic & Testing /Scan Tools\n
Hand Tools/Open Stock\n
Hand Tools/Sockets and Drive Sets\n
More Data Needed\n
Hand Tools/Open Stock\n
Hand Tools/Open Stock\n
Hand Tools/Open Stock\n
Shop Supplies & Equip/Tool Storage\n
Hidden non listed\n
Shop Supplies & Equip/Heaters\n

Code:

void removeCategories(FILE *category, int *prodPrinted){

char more[17] = { '\0' }, hidden[18] = { '\0' };
int newLineCount = 0, i, ch = 'a', fix = 0;

while ((ch = fgetc(category)) != EOF){  //if fgetc is outside while, it works//

    more[15] = hidden[16] = ch;
    printf("%c", ch);

    /*shift char in each list <- one*/
    for (i = 0; i < 17; i++){
        if (i < 17){
            hidden[i] = hidden[i + 1];
        }
        if (i < 16){
            more[i] = more[i + 1];
        }
    }

    if (strcmp(more, "More Data Needed") == 0 || strcmp(hidden, "Hidden non listed") == 0){
        prodPrinted[newLineCount] = 0;
        /*printf("%c", more[0]);*/
    }
    if (ch == '\n'){
        newLineCount++;
    }
} 

}

Community
  • 1
  • 1
TinMan
  • 99
  • 2
  • 13
  • 1
    [`fseek`](http://www.cplusplus.com/reference/cstdio/fseek/) to beginning of file first (im guessing your using this `FILE *` in other places or calling this function multiple times – amdixon Nov 24 '13 at 01:00
  • This is very closely related to [`fgetc()` not working — returns same char repeatedly](http://stackoverflow.com/questions/20158061/fgetc-not-working-c-returns-same-char-repeatedly). The specific flaw identified for that question has been fixed (it would be good if you accepted the answer — it lets people know you appreciate their help). The loops _have_ changed; the indentation is still erratic. The structure is similar — and the intent still ill defined. – Jonathan Leffler Nov 24 '13 at 02:08
  • (a) What is this program/function trying to do; (b) what does the calling code look like? You should review what happens when you read EOF (you certainly do processing of data after it occurs as if it has not occurred). It would help enormously to have a few (2-5) lines of input data, and the expected output from those lines of input. – Jonathan Leffler Nov 24 '13 at 02:13
  • You don't null terminate your strings properly. The read/assignment line `more[15] = hidden[16] = ch = fgetc(category);` writes over the nulls at the end of `more` and `hidden`, leaving you strings without a null terminator, so the `strcmp()` operations fail when you finally get characters moved to the start of the strings. – Jonathan Leffler Nov 24 '13 at 02:28

3 Answers3

1

Let computers do the counting. You have not null terminated your strings properly. The fixed strings (mdn and hdl are initialized but do not have null terminators, so string comparisons using them are undefined.

Given this sample data:

Example 1
More Data Needed
Hidden non listed
Example 2
Keeping lines short.
But as they get longer, the overwrite is worse...or is it?
Hidden More Data Needed in a longer line.
Lines containing "Hidden non listed" are zapped.
Example 3

This version of the program:

#include <stdio.h>
#include <string.h>

static
void removeCategories(FILE *category, int *prodPrinted)
{
    char more[17] = { '0' };
    char hidden[18] = { '0' };
    char mdn[17] = { "More Data Needed" };
    char hnl[18] = { "Hidden non listed" };
    int newLineCount = 0, i, ch = '\0';

    do
    {
        /*shift char in each list <- one*/
        for (i = 0; i < 18; i++)
        {
            if (i < 17)
                hidden[i] = hidden[i + 1];
            if (i < 16)
                more[i] = more[i + 1];
        }
        more[15] = hidden[16] = ch = fgetc(category);
        if (ch == EOF)
            break;
        printf("%c", ch);           /*testing here, starts rndmly in file*/
        //printf("<<%c>> ", ch);           /*testing here, starts rndmly in file*/

        //printf("more <<%s>> hidden <<%s>>\n", more, hidden);
        if (strcmp(more, mdn) == 0 || strcmp(hidden, hnl) == 0)
        {
            prodPrinted[newLineCount] = 0;
        }
        if (ch == '\n')
        {
            newLineCount++;
        }
    } while (ch != EOF);
}

int main(void)
{
    int prod[10];
    for (int i = 0; i < 10; i++)
        prod[i] = 37;
    removeCategories(stdin, prod);
    for (int i = 0; i < 10; i++)
        printf("%d: %d\n", i, prod[i]);
    return 0;
}

produces this output:

Example 1
More Data Needed
Hidden non listed
Example 2
Keeping lines short.
But as they get longer, the overwrite is worse...or is it?
Hidden More Data Needed in a longer line.
Lines containing "Hidden non listed" are zapped.
Example 3
0: 37
1: 0
2: 0
3: 37
4: 37
5: 37
6: 0
7: 0
8: 37
9: 37
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Added some sample input and function calls, please take a look. – TinMan Nov 24 '13 at 03:06
  • I think I did a respectable job of guessing what your data looks like, and what the data structure looks like. The analysis in my answer stands as accurate. I changed my code to use an array of 20 (more than sufficient to hold data for 15 lines of sample data), and then tried it on your sample data, and the entries corresponding to lines 4, 9, and 14 were zeroed, as you seem to expect. – Jonathan Leffler Nov 24 '13 at 03:29
0

Maybe you can try rewinding the file pointer at the beginning of your function.

 rewind(category);

Most likely another function is reading from the same file. If this solves your problem, it would be better to find which other function (or previous call to this function) is reading from the same file and make sure rewinding the pointer won't break something else.

EDIT:

And just to be sure, maybe you could change the double assignment to two different statements. Based on this post, your problem might as well be caused by a compiler optimization of that line. I haven't checked with the standard, but according to best answer the behavior in c and c++ might be undefined, therefore your strange results. Good luck

Community
  • 1
  • 1
  • The file is used in only one function. Tried adding rewind(category) with same results from the printf. – TinMan Nov 24 '13 at 02:22
  • I see a few problems in your code. There is a loop running until.control variable reaches 17, and inside you have only 2 conditionals that will never be true if your cotrol variable reaches 17. Another sugestion would be to always use bracers for 'if', 'else','while', etc. regardless of whether you only have one instruction or more inside. You are also not checking for.errors when reading from stdin. This discussion might help you: http://stackoverflow.com/questions/3495092/read-from-file-or-stdin-c – Dissident penguin Nov 24 '13 at 10:59
  • In the above mentioned link you can find the solution to your.problem. stdin is a stream and not a FILE *. All the other problems in your code should also be fixed, but he answer to your question is there already. Good luck. – Dissident penguin Nov 25 '13 at 11:02
  • Also the first "if" inside your "for" loop is a tautology. It will always be true since your control variable will always be < 17. You can simply remove it. – Dissident penguin Nov 26 '13 at 10:31
  • One last suggestion would be to check the address of stdin at the beginning of your program, and check it again at the beginning of your problem function. Based on the example code you have posted, my guess is you might be writing out of the boundaries of an array in another function, and this could be overwriting the address of stdin, even if you haven't intentionally touched it. Would you mind posting the content of your other functions? – Dissident penguin Nov 26 '13 at 13:15
  • Not sure how to check the addr of stdin, but I have added the two functions before the problem function to the end of the post. – TinMan Nov 26 '13 at 18:03
  • You should check that NewLineCount, TotalCount and Count variables are minor than the sizes of the arrays you are indexing with them. Also try writing more[15] = ch; hidden[16] = ch; instead of the double assignment. To check the address of stdin, your debugger should tell you. Just add it as a variable you want to watch. Maybe you can try to use command line arguments to get the file name instead of pipelining through stdin and verify that you have the same problem. – Dissident penguin Nov 26 '13 at 19:15
  • Can I email you the entire code and txt files? I'm new to c, maybe you can find the problem. – TinMan Nov 27 '13 at 17:42
  • I was already rerwiting some parts of your functions, but there is certain information I need to know. Are you using stdin as input because you don't know how to read from the command line? Using stdin limits a lot the things you can do while working with files. If you can not know in advance the size of the file you are going to process and that information is not contained in the file (file header), your only choice is to first scan the file to find tis size and ask for the right amount of memory using malloc. For this you need to "rewind", but in stdin it will flush the input buffer. – Dissident penguin Nov 27 '13 at 17:51
  • Send me the project and I'll take a look at it. You can send it to the e-mail address composed of the two words in my user name separated by an underscore, and then (at) hotmail.com (all in lowercase). – Dissident penguin Nov 27 '13 at 17:58
  • I am only reading from files with fscanf and fgetc. I had no problems with fscanf, but when I had to read a file with spaces, I needed to use fgetc. I'll send you the info in a couple minutes with more info. – TinMan Nov 27 '13 at 18:09
0

You may check which mode you opened the file, and you may have some error-check to make sure you have got the right return value.

Here you can refer to man fopen to get which mode to cause the stream position.

   The fopen() function opens the file whose name is the string pointed to
   by path and associates a stream with it.
   The argument mode points to a string beginning with one of the  follow‐
   ing sequences (Additional characters may follow these sequences.):

   r      Open  text  file  for  reading.  The stream is positioned at the
          beginning of the file.

   r+     Open for reading and writing.  The stream is positioned  at  the
          beginning of the file.

   w      Truncate  file  to  zero length or create text file for writing.
          The stream is positioned at the beginning of the file.

   w+     Open for reading and writing.  The file is created  if  it  does
          not  exist, otherwise it is truncated.  The stream is positioned
          at the beginning of the file.

   a      Open for appending (writing at end of file).  The file  is  cre‐
          ated  if it does not exist.  The stream is positioned at the end
          of the file.

   a+     Open for reading and appending (writing at end  of  file).   The
          file is created if it does not exist.  The initial file position
          for reading is at the beginning  of  the  file,  but  output  is
          always appended to the end of the file.

And there is another notice, that the file you operated should not more than 2G, or there maybe problem.

And you can use fseek to set the file position indicator.

And you can use debugger to watch these variables to see why there are random value. I think debug is efficient than trace output.

thinkinnight
  • 141
  • 1
  • 1
  • 6
  • Additional info: File size: 617kb, opened for "r". – TinMan Nov 24 '13 at 02:18
  • I tried fseek to set the start position to the beginning with same results. – TinMan Nov 24 '13 at 02:19
  • Can you add fgetc() right after the function removeCategories()? To see which character it get, did it also the random character? – thinkinnight Nov 24 '13 at 12:23
  • Just tried it, it starts scanning at the beginning of the file. However, if I put fgetc() anywhere inside the while loop, it starts scanning randomly in the file. What might be causing this behavior? – TinMan Nov 24 '13 at 19:23
  • I donot think it is the fgetc() fault, and I have re-checked your code, you have updated your code. I have tested it, the fgetc() get the right character, so can you please give more detail about it but keep it simple so I can build the same enviroment to debug? Another notice is that you defined the error array. The hidden and more array you defined is 1 character less than expected. The hidden array is from [0] to [16], the last character should be [17], and it should be '\0' for strcmp to do the compare work, you can do a printf after the for loop to check this. – thinkinnight Nov 25 '13 at 02:49
  • It seems to be a problem with the length of my file, 617kb. I have tested with smaller files and it works fine. – TinMan Nov 25 '13 at 03:44